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Introduction 

Increasing  use  of  genomic  discovery  efforts  in  patients  with  bone  marrow 
failure  due  to  myelodysplastic  syndrome  (MDS)  has  led  to  the  rapid  discovery  of  a  series 
of  recurrent  genetic  abnormalities  underlying  these  disorders.  Remarkably,  a  large 
number  of  these  alterations  appear  to  be  in  genes  whose  function  is  known,  or 
suspected,  to  be  involved  in  epigenetic  regulation  of  gene  transcription.  In  the  last  3 
years  alone,  mutations  in  the  genes  TET2,  ASXL1,  DNMT3a,  and  EZH2  have  all  been 
found  to  be  frequent  mutations  amongst  patients  with  MDS.  Mutations  in  several  of 
these  genes  have  proven  to  be  important  markers  of  disease  outcome  with  ASXL1  and 
EZH2  mutations  recurrently  being  identified  as  adverse  prognosticators  in  MDS  patients. 
Identification  of  frequent  mutations  in  epigenetic  modifiers  has  also  highlighted  the  fact 
that  a  number  of  these  genes  encode  enzymes  and/or  result  in  alterations  in  enzymatic 
alterations  which  may  represent  novel,  tractable  therapeutic  targets  for  MDS  patients.  In 
this  proposal,  we  originally  aimed  to  identify  (a)  if  mice  with  genetically  engineered 
deletion  of  epigenetic  modifiers  mutated  in  MDS  would  serve  as  valuable  murine  models 
of  MDS,  (b)  if  mutations  in  epigenetic  modifiers  may  specifically  impact  DNA  methylation 
and/or  histone  post-translational  modifications  in  a  manner  that  is  therapeutically 
targetable,  and  (c)  if  additional  mutations  must  exist  in  patients  with  specific  subsets  of 
MDS  with  the  worst  clinical  outcome.  Since  awarding  of  the  proposal,  we  have  made 
major  insights  into  the  epigenomic  function  of  ASXL1  as  well  as  the  biological  impact  of 
conditional  deletion  of  Asxll  alone  and  in  combination  with  other  genetic  alterations 
including  Tet2 deletions  and  NRasG12D  overexpression.  In  addition,  we  have  recently 
identified  that  an  additional  class  of  very  frequency  mutations  in  MDS  patients  affecting 
the  spliceosome  impacts  EZH2  function.  This  work  has  resulted  in  several  publications, 
multiple  oral  presentations  at  national  meetings,  and  has  been  used  as  the  basis  for 
several  additional  foundation  awards  (from  Damon  Runyon  Foundation,  the  V 
Foundation,  and  the  Evans  Foundation)  and  is  the  basis  for  an  NIH  R01  application  I 
have  pending. 

Keywords: 

5-azacytidine,  ASXL1 ,  Decitabine,  Epigenetics,  EZH2,  Genomics,  Mouse  models, 
Myelodysplastic  Syndromes,  Splicing,  SRSF2,  TET2. 

Accomplishments 

Key  Research  Accomplishments 

•  Developed  and  published  the  first  conditional  knockout  mouse  for  Asxll 
as  well  as  the  first  murine  model  with  combined  Asxll  and  Tet2  deletion. 

We  believe  these  models  are  valuable  genetically  accurate  murine 
models  of  acquired  bone  marrow  failure. 

•  Identified  the  biological  effects  of  Asxll  loss  on  hematopoiesis,  alone  and 
in  combination  with  other  co-occurring  genetic  alterations. 

•  Generated  the  first  murine  model  of  spliceosomal  mutations  as  seen  in 
patients  with  MDS. 

•  Identified  an  important  intersection  of  spliceosomal  gene  alterations  on 
the  epigenome  of  MDS. 

In  addition  to  the  above  summary,  below  is  a  more  detailed  summary  of 
accomplishments  organized  by  Tasks  from  the  original  grant  submission: 
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Task  1.  “Obtain  DoD  ACURO  approval  for  the  use  of  animals  in  the  experiments 
outlined  below  in  Tasks  2  to  4.” 

We  have  nearly  completed  DoD  ACURO  approval  for  all  experiments  related  to  this 
award.  We  are  awaiting  final  confirmation  on  approval  from  DoD  currently. 

Task  2.  “Complete  characterization  of  mice  with  conditional  deletion  of  Asxll  alone  and 
Asxll  combined  with  Tet2  (Months  1-24)  at  the  work  performance  site  of  Memorial 
Sloan-Kettering  Cancer  Center.” 

As  noted  in  our  annual  review  in  2014,  we  completed  generation  of  mice  with  deletion  of 
Asxll,  Tet2,  or  both  using  multiple  different  Cre  recombinases.  This  work  was  recently 
published  in  2013  in  the  Journal  of  Experimental  Medicine  (Abdel-Wahab,  O,  et  al.  J 
Exp  Med 2013  Nov  18;210(12):2641-59)  and  have  been  used  by  the  MDS  research 
community  internationally.  We  have  deposited  these  mice  at  the  Jackson  Laboratory  for 
public  use. 

Task  3.  Continue  development  of  mice  with  Ezh2  deletion  alone  and  characterize  mice 
with  compound  deletion  of  Ezh2/Tet2  and  Ezh2/Asxl1  (Months  1-24)  at  the  work 
performance  site  of  Memorial  Sloan-Kettering  Cancer  Center. 

We  recently  generated  mice  with  Ezh2  deletion  in  the  postnatal  compartment  ( Mxl-cre 
Ezh2fl/fl)  mice  and  mice  with  compound  deletion  of  Ezh2  and  Asxll.  From  these  murine 
models  we  have  identified  that: 

(i)  Hematopoietic  stem  cells  (HSCs)  from  mice  with  compound  Asxl1/Ezh2  loss 
have  impaired  self-renewal  compared  with  HSCs  from  littermate  control  mice 
as  well  as  mice  with  deletion  of  either  gene  alone. 

(ii)  A  high  proportion  of  wildtype  mice  reconstituted  with  bone  marrow  from  mice 
with  compound  Asxll  /Ezh2  (Mxl-cre  Asxll  Ml  Ezh2fl/fl)  deletion  die  of  bone 
marrow  failure  within  weeks  of  deletion  of  these  genes.  Surviving  mice  are 
characterized  by  anemia  and  leukopenia  as  well  as  morphologic  dysplasia. 

The  above  phenotypes  of  mice  with  compound  deletion  of  both  Asxll  and  Ezh2  are 
dramatic  and  we  are  now  working  to  functionally  understand  the  mechanism  by  which 
deletion  of  these  2  genes  impairs  HSC  function. 

In  addition  to  the  above,  we  have  recently  identified  the  unexpected  observation  that 
mutations  in  the  spliceosomal  protein  SRSF2,  commonly  identified  in  MDS  patients, 
results  in  mis-splicing  of  EZH2.  Interestingly,  SRSF2  mutations  and  loss-of-function 
EZH2  mutations  in  MDS  are  100%  mutually  exclusive  but  the  functional  basis  for  this 
interaction  was  not  known  previously.  Our  work  provided  the  basis  for  this  observation 
and  identified  another  mechanism  by  which  EZH2  is  dysregulated  in  MDS.  These  data 
were  recently  published  in  the  following  manuscript  (see  Appendix  #1): 

Kim  E,  llagan  JO,  Liang  Y,  Daubner  GM,  Lee  SC,  Ramakrishnan  A,  Li  Y,  Chung  YR, 
Micol  JB,  Murphy  ME,  Cho  H,  Kim  MK,  Zebari  AS,  Aumann  S,  Park  CY,  Buonamici  S, 
Smith  PG,  Deeg  HJ,  Lobry  C,  Aifantis  I,  Modis  Y,  Allain  FH,  Halene  S,  Bradley  RK, 
Abdel-Wahab  O.  SRSF2  Mutations  Contribute  to  Myelodysplasia  by  Mutant-Specific 
Effects  on  Exon  Recognition.  Cancer  Cell.  201 5  May  1 1  ;27(5):61 7-30.  doi: 
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10.1 01 6/j.ccell.201 5.04.006.  PubMed  PMID:  25965569;  PubMed  Central  PMCID: 
PMC4429920. 

Task  4.  Determine  the  epigenetic  contribution  of  Asxll  and  Ezh2  loss  to  bone  marrow 
failure  through  Chromatin  immunoprecipitation  (ChIP)  of  histone  H3  lysine  27  trimethyl 
(H3K27me3)  followed  by  next-generation  sequencing  in  primary  murine  hematopoietic 
cells  (Months  1-24)  at  the  work  performance  site  of  Memorial  Sloan-Kettering  Cancer 
Center. 

As  noted  in  2  prior  annual  reports,  we  have  completed  detailed  characterization  of  the 
effects  of  ASXL1  mutations  and  loss  using  cell  lines  and  primary  cells  from  knockout 
mice.  These  results  have  been  published  now  in  2  papers  (Abdel-Wahab,  O,  et  al. 
Cancer  Cell  2012  and  Abdel-Wahab,  O,  et  al.  J  Exp  Med  201 3). 

Task  5:  Determine  the  effect  of  Tet2,  Asxll,  and  Ezh2  loss  to  a  panel  of  currently 
clinically  utilized  compounds  in  patients  with  MDS.  Drug  panel  will  include  decitabine,  5- 
azacytidine,  lenalidomide,  cytarabine,  daunorubicin,  HDACi  (vorinostat,  romidepsin, 
panobinostat,  AR-42,  trichostatin  A),  HSP-90  inhibitors  (AUY-922,  PUH-71),  and 
parthenolide  (Months  1-24)  at  the  work  performance  site  of  Memorial  Sloan-Kettering 
Cancer  Center. 

We  are  now  performing  these  experiments  ex  vivo  through  use  of  methylcellulose  colony 
assays.  In  brief,  hematopoietic  stem/progenitor  cells  (HSPCs;  lineage-negative  Scal-i-  c- 
KIT+  cells)  from  Tet2  knockout,  Asxll  knockout,  Ezh2  knockout,  and  Tet2/Asxl1  double 
knockout  mice  are  being  plated  in  methylcellulose  with  a  variety  of  the  above 
compounds  for  7  days.  We  are  evaluating  the  effects  of  these  compounds  on  restoring 
colony  formation  (for  Asxll  and  Ezh2  knockout  HSPCs)  or  reducing  colony  formation  (for 
Tet2  and  Tet2/Asxl1  knockout  HSPCs).  This  work  is  underway. 

In  addition  to  the  above  preclinical  experiments,  we  have  recently  completed  a  study 
analyzing  the  impact  of  (i)  common  mutations  in  MDS  and  (ii)  patterns  of  DNA  genome¬ 
wide  methylation  on  response  to  decitabine  treatment.  This  was  performed  on  a 
uniformly  treated  cohort  of  40  patients.  Although  we  did  not  find  any  association  between 
mutations  and  response  to  decitabine,  using  the  methylation  profiles,  we  developed  an 
epigenetic  classifier  that  accurately  predicted  DAC  response  at  the  time  of  diagnosis. 
This  work  was  recently  published  as  follows  (see  Appendix  #2): 

Meldi  K,  Qin  T,  Buchi  F,  Droin  N,  Sotzen  J,  Micol  JB,  Selimoglu-Buet  D,  Masala  E, 

Allione  B,  Gioia  D,  Poloni  A,  Lunghi  M,  Solary  E,  Abdel-Wahab  O,  Santini  V,  Figueroa 
ME.  Specific  molecular  signatures  predict  decitabine  response  in  chronic 
myelomonocytic  leukemia.  J  Clin  Invest.  2015  May;125(5):1 857-72.  doi: 

10.1 172/JCI78752.  Epub2015  Mar  30.  PubMed  PMID:  25822018. 

Task  5:  Perform  candidate  gene  and  exome  sequencing  on  DNA  samples  from  20  MDS 
patients  with  ASXL1  mutations  alone  (Months  1-6)  at  the  work  performance  site  of 
Memorial  Sloan-Kettering  Cancer  Center. 

In  order  to  complete  this  task  and  to  inform  task  #5,  we  recently  performed  targeted  DNA 
sequencing  on  pretreatment  DNA  samples  from  a  cohort  of  MDS  patients  uniformly 
treated  with  decitabine.  This  work,  performed  in  collaboration  with  MDS  clinical  expert 
Dr.  Valeria  Santini,  revealed  that  ASXL1  mutations  frequently  co-occur  with  mutations  in 
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the  spliceosome-associated  protein  SRSF2  in  patients  with  MDS/MPN  overlap 
syndromes.  This  interesting  finding  suggests  an  interaction  by  mutations  in  the 
epigenome  with  mutations  in  the  spliceosome.  Moreover,  this  work  has  resulted  in  one 
recent  publication  as  noted  above  (in  “Task  5”). 

Task  6:  Perform  candidate  gene  and  exome  sequencing  on  DNA  samples  from  40 
patients  with  MDS  accompanied  by  moderate  to  severe  bone  marrow  fibrosis  (Months  1- 
6)  at  the  work  performance  site  of  Memorial  Sloan-Kettering  Cancer  Center. 

We  have  now  collected  samples  from  40  such  patients  with  MDS  with  bone  marrow 
fibrosis  and  hope  to  begin  performing  DNA  sequencing  soon.  We  recently  helped  to 
generate  a  DNA  next-generation  sequencing  panel  of  300  genes  implicated  in  cancer 
pathogenesis  at  our  institution.  We  will  apply  this  sequencing  platform  to  these  MDS 
samples  with  the  hopes  of  characterizing  any  novel  mutations  associated  with  this 
unique  subtype  of  MDS. 

Task  7:  Present  findings  at  national  meetings  and  publish  in  peer-reviewed  journals 
(Month  6-36). 

I  have  given  15  presentations  at  national/international  meetings  on  the  work  performed 
with  funding  from  this  award  in  the  last  year  (see  list  of  presentations  in  Products 
below). 

I  have  also  been  invited  to  write  several  reviews  related  to  the  work  described  in  this 
proposal  in  well-respected  journals  including  Journal  of  Clinical  Investigation  (cited  in 
Products  below). 

Impact 

Genomic  discovery  efforts  in  patients  with  MDS  have  revealed  that  the  most  frequent 
somatic  mutations  in  these  disorders  are  in  genes  involved  in  either  epigenetic 
regulation  or  RNA  splicing.  We  and  others  have  recently  shown  that  mutations  in  the 
Polycomb-associated  gene  ASXL1  and  the  spliceosmal  gene  SRSF2  have  adverse 
prognostic  importance  in  patients  with  all  myeloid  malignancies  including  MDS,  acute 
myeloid  leukemia  (AML),  chronic  myelomoncytic  leukemia  (CMML),  and  primary 
myelofibrosis.  We  therefore  have  focused  on  understanding  the  role  of  these  mutations 
in  MDS  pathogenesis.  In  brief,  we  have  identified  that  the  loss-of-function  mutations  in 
ASXL1  as  well  as  the  gain-of-function  mutations  in  SRSF2  both  converge  on  decreased 
function  of  the  Polycomb  Repressive  Complex  2  (PRC2).  This  work  has  resulted  in 
multiple  genetically  accurate  models  of  MDS  as  well  as  reagents  to  screen  for  novel 
therapeutic  targets  for  TET2-,  ASXL1-  or  SRSF2- mutant  cells. 


Changes/Problems 

Nothing  to  report. 

Products 

Original  Manuscripts: 

1 :  Kim  E,  llagan  JO,  Liang  Y,  Daubner  GM,  Lee  SC,  Ramakrishnan  A,  Li  Y,  Chung  YR, 
Micol  JB,  Murphy  ME,  Cho  H,  Kim  MK,  Zebari  AS,  Aumann  S,  Park  CY,  Buonamici  S, 
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Smith  PG,  Deeg  HJ,  Lobry  C,  Aifantis  I,  Modis  Y,  Allain  FH,  Halene  S,  Bradley  RK, 
Abdel-Wahab  O.  SRSF2  Mutations  Contribute  to  Myelodysplasia  by  Mutant-Specific 
Effects  on  Exon  Recognition.  Cancer  Cell.  2015  May  1 1  ;27(5):617-30.  doi: 

10.1 01 6/j.ccell.201 5.04.006.  PubMed  PMID:  25965569;  PubMed  Central  PMCID: 
PMC4429920. 

2:  Meldi  K,  Qin  T,  Buchi  F,  Droin  N,  Sotzen  J,  Micol  JB,  Selimoglu-Buet  D,  Masala  E, 
Allione  B,  Gioia  D,  Poloni  A,  Lunghi  M,  Solary  E,  Abdel-Wahab  O,  Santini  V,  Figueroa 
ME.  Specific  molecular  signatures  predict  decitabine  response  in  chronic 
myelomonocytic  leukemia.  J  Clin  Invest.  2015  May;125(5):1 857-72.  doi: 

1 0.1 1 72/JCI78752.  Epub  201 5  Mar  30.  PubMed  PMID:  2582201 8. 

3:  Guzman  ML,  Yang  N,  Sharma  KK,  Balys  M,  Corbett  CA,  Jordan  CT,  Becker  MW, 
Steidl  U,  Abdel-Wahab  O,  Levine  RL,  Marcucci  G,  Roboz  GJ,  Hassane  DC.  Selective 
activity  of  the  histone  deacetylase  inhibitor  AR-42  against  leukemia  stem  cells:  a  novel 
potential  strategy  in  acute  myelogenous  leukemia.  Mol  Cancer  Ther.  2014 
Aug;13(8):1 979-90.  doi:  10.1 158/1 535-71 63.MCT-1 3-0963.  Epub  2014  Jun  16.  PubMed 
PMID:  24934933;  PubMed  Central  PMCID:  PMC4383047. 
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25244089;  PubMed  Central  PMCID:  PMC4191026. 
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modifiers  in  cancers.  Biochem  Biophys  Res  Commun.  2014  Dec  5;455(1-2):24-34.  doi: 
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SUMMARY 

Mutations  affecting  spliceosomal  proteins  are  the  most  common  mutations  in  patients  with  myelodysplastic 
syndromes  (MDS),  but  their  role  in  MDS  pathogenesis  has  not  been  delineated.  Here  we  report  that  mutations 
affecting  the  splicing  factor  SRSF2  directly  impair  hematopoietic  differentiation  in  vivo,  which  is  not  due  to 
SRSF2  loss  of  function.  By  contrast,  SRSF2  mutations  alter  SRSF2’s  normal  sequence-specific  RNA  binding 
activity,  thereby  altering  the  recognition  of  specific  exonic  splicing  enhancer  motifs  to  drive  recurrent  mis- 
splicing  of  key  hematopoietic  regulators.  This  includes  SRSF2  mutation-dependent  splicing  of  EZH2,  which 
triggers  nonsense-mediated  decay,  which,  in  turn,  results  in  impaired  hematopoietic  differentiation.  These 
data  provide  a  mechanistic  link  between  a  mutant  spliceosomal  protein,  alterations  in  the  splicing  of  key  reg¬ 
ulators,  and  impaired  hematopoiesis. 


INTRODUCTION 

Somatic  mutations  in  genes  encoding  components  of  the  spli- 
ceosome  have  been  identified  in  a  spectrum  of  human  malig¬ 
nancies,  including  ~60%  of  patients  with  myelodysplastic 


syndromes  (MDS)  (Bejar  et  al.,  2012;  Papaemmanuil  et  al., 
2013;  Yoshida  et  al.,  2011).  These  mutations  occur  most 
commonly  in  SF3B1  (Splicing  Factor  3b  Subunit  1),  SRSF2 
(Serine/arginine-Rich  Splicing  Factor  2),  and  U2AF1  ( U2  Small 
Nuclear  RNA  Auxiliary  Factor  1)  and  almost  always  as 


Significance 

Frequent  somatic  mutations  affecting  components  of  the  spliceosome  have  been  identified  in  hematologic  malignancies; 
however,  the  functional  role  of  these  mutations  is  not  known.  Here  we  identify  that  commonly  occurring  mutations  in  the 
spliceosomal  gene  SRSF2  impair  hematopoietic  differentiation  and  promote  myelodysplasia  by  altering  SRSF2’s  prefer¬ 
ence  for  specific  exonic  splicing  enhancer  motifs.  This  results  in  consistent  mis-splicing  in  a  manner  that  promotes  the 
expression  of  abnormal  isoforms  of  a  number  of  key  hematopoietic  regulators,  some  of  which  have  been  linked  previously 
to  leukemogenesis  (including  BCOR  and  EZFI2).  These  data  provide  a  mechanistic  basis  for  the  enrichment  of  spliceosomal 
mutations  in  myelodysplasia  and  identify  altered  RNA  recognition  as  an  important  driver  of  leukemogenesis. 
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heterozygous  missense  mutations  that  are  mutually  exclusive 
(Papaemmanuil  et  al.,  2011;  Wang  et  al.,  2011;  Yoshida  et  al., 

2011) .  Although  the  genetic  data  in  MDS  suggest  that  these 
alterations  are  critical  to  disease  pathogenesis,  it  remains  un¬ 
known  how  these  mutations  contribute  to  MDS  and  whether 
they  are  sufficient  to  induce  MDS. 

Recent  studies  have  suggested  that  mutations  in  the  spliceo- 
somal  gene  U2AF1  alter  RNA  splicing  (Brooks  et  al.,  201 4;  Grau- 
bert  et  al.,  2012;  llagan  et  al.,  2015;  Przychodzen  et  al.,  2013; 
Quesada  et  al.,  2012),  and  studies  of  gene  expression  in  primary 
patient  samples  with  and  without  U2AF1  mutations  have  been 
performed  in  an  effort  to  identify  downstream  mis-spliced  genes 
that  might  contribute  to  abnormal  hematopoiesis  (Brooks  et  al., 
2014;  Graubert  et  al.,  2012;  llagan  et  al.,  2015).  However,  it  re¬ 
mains  unknown  how  these  mutations  contribute  to  hematopoiet¬ 
ic  transformation.  To  date,  no  studies  have  investigated  the 
in  vivo  effects  of  spliceosomal  mutations  expressed  from  the 
endogenous  locus  in  the  correct  cellular  context,  which  might 
allow  delineation  of  how  these  alleles  contribute  to  MDS 
pathogenesis. 

To  test  whether  spliceosomal  gene  mutations  are  sufficient  to 
drive  MDS  and  determine  how  altered  RNA  splicing  contributes 
to  transformation  in  vivo,  we  studied  the  biological  and  transcrip¬ 
tional  consequences  of  mutations  in  SRSF2.  SRSF2  mutations 
occur  in  20%-30%  of  MDS  and  ~50%  of  chronic  myelomono- 
cytic  leukemia  (CMML)  patients  (Papaemmanuil  et  al.,  2013; 
Yoshida  et  al.,  2011).  SRSF2  is  a  member  of  the  serine/argi¬ 
nine-rich  (SR)  protein  family  that  contributes  to  both  constitutive 
and  alternative  splicing  by  binding  to  exonic  splicing  enhancer 
(ESE)  sequences  within  pre-mRNA  through  its  RNA  recognition 
motif  domain  (RRM)  (Graveley  and  Maniatis,  1998;  Liu  et  al., 
2000;  Schaal  and  Maniatis,  1999;  Zahler  et  al.,  2004).  SRSF2  mu¬ 
tations  are  consistently  associated  with  adverse  outcomes 
among  MDS  and  acute  myeloid  leukemia  (AML)  patients  (Pa¬ 
paemmanuil  et  al.,  2013;  Vannucchi  et  al.,  2013;  Zhang  et  al., 

2012) .  Despite  the  clinical  importance  of  SRSF2  mutations,  to 
date  there  have  been  no  studies  of  the  functional  impact  of 
SRSF2  mutations  on  hematopoiesis  or  splicing.  Here  we  studied 
the  biological  and  transcriptional  effects  of  somatic  expression 
of  the  common  SFSF2P95H  mutation  in  the  hematopoietic 
compartment. 

RESULTS 

Srsf2P95H  Mutant  Mice  Develop  MDS,  a  Phenotype 
Distinct  from  Mice  with  Heterozygous  or  Homozygous 
Loss  of  Srsf2 

Given  the  genetic  heterogeneity  of  primary  patient  samples  as 
well  as  the  fact  that  stable  overexpression  of  spliceosomal  pro¬ 
teins,  even  in  wild-type  (WT)  form,  is  poorly  tolerated  (Lareau 
et  al.,  2007),  we  first  generated  a  murine  model  for  conditional 
expression  of  the  commonly  occurring  SFSF2P95H  mutation 
from  the  endogenous  murine  locus  of  Srsf2  (Figure  1  A;  Figures 
SI  A  and  SIB).  Mice  heterozygous  for  the  Srs/2P95H  allele 
(Srsf2 P95H/WT)  were  crossed  to  Mxl-cre  transgenic  mice 
(Kuhn  et  al.,  1995)  on  a  C57BL/6  background  to  allow  for  induc¬ 
ible  expression  of  Cre  recombinase  following  intraperitoneal  in¬ 
jection  of  polyinosine-polycytosine  (pIpC)  (12  pig/g  every  other 
day  for  3  days  by  injection,  as  described  previously  [Moran-Cru- 


sio  et  al.,  2011;  Figures  SIC  and  SID;  Supplemental  Experi¬ 
mental  Procedures]).  mRNA  sequencing  (RNA-seq)  analysis  of 
hematopoietic  stem/progenitor  cells  (HSPCs)  2  weeks  after 
the  last  pIpC  injection  of  6-week-old  Mxl-cre  Srs/2P95H/WT 
and  Mxl-cre  Srsf2  WT  control  mice  confirmed  heterozygous 
expression  of  the  mutant  allele  in  equal  proportion  to  the  re¬ 
maining  WT  Srsf2  allele  in  Mxl-cre  Srs/2P95H/WT  mice 
(Figure  IB). 

It  is  currently  unknown  whether  the  heterozygous  SFSF2P95H 
mutation  confers  a  gain  of  function,  haploinsufficient  loss  of 
function,  or  dominant-negative  loss  of  function.  We  therefore 
compared  expression  of  the  Srsf2 P95H  mutation  with  the  condi¬ 
tional  loss  of  Srsf2  in  vivo  (Wang  et  al.,  2001).  Bone  marrow  (BM) 
mononuclear  cells  (MNCs)  from  6-week-old  CD45.2  Mxl-cre 
Srsf2  WT,  Mxl-cre  Srsf2f\/\NT  (heterozygous  floxed  mice  for 
inducible  deletion  of  one  copy  of  Srsf2),  Mxl-cre  Srsf2M\  (ho¬ 
mozygous  floxed  mice  for  inducible  deletion  of  both  copies  of 
Srsf2),  and  Mxl-cre  Srs/2P95H/WT  were  transplanted  into 
lethally  irradiated  congenic  CD45.1  recipient  mice,  followed  by 
pIpC  injection  4  weeks  later  (note  that  all  mice  were  treated 
with  pIpC  to  control  for  any  potential  phenotypic  effects  of 
pIpC  administration  on  biological  or  splicing  phenotypes).  This 
was  done  to  assess  for  the  phenotypic  effects  of  Srsf2  deletion 
or  mutation  in  a  hematopoietic  cell-autonomous  manner.  West¬ 
ern  blot  (WB)  analysis  revealed  the  deletion  of  Srsf2  in  BM  MNCs 
from  Mxl-cre  Srs/2fl/fl  mice  and  normal  total  Srsf2  levels  in  Mxl- 
cre  Srs/2P95H/WT  BM  MNCs  (Figure  S1E).  Significant  leuko¬ 
penia  and  anemia  were  seen  in  mice  with  homozygous  Srsf2 
deletion  or  heterozygous  expression  of  the  P95H  mutation 
18  weeks  post-transplant  (Figures  1C  and  ID)  that  was  also 
seen  at  earlier  time  points  (Figures  SI  F  and  SI  G).  The  presence 
of  similar  cytopenias  in  mice  bearing  a  homozygous  Srsf2  dele¬ 
tion  and  a  heterozygous  Srs/2P95H  point  mutation  suggested  a 
possible  dominant-negative  function  imposed  by  the  P95H  mu¬ 
tation.  However,  the  anemia  in  Srs/2P95H  mice  was  character¬ 
ized  by  increased  mean  corpuscular  volume  (MCV)  of  red  blood 
cells  relative  to  WT  mice  or  mice  with  loss  of  one  to  two  copies  of 
Srsf2  (Figure  IE).  Moreover,  histological  assessment  of  mice 
14  weeks  post-pIpC  revealed  prominent  BM  aplasia  in  Srsf2  ho¬ 
mozygous  knockout  (KO)  mice,  whereas  mice  expressing  the 
heterozygous  P95H  mutation  had  normal  BM  cellularity  (Fig¬ 
ure  1 F).  Platelet  counts  were  normal  in  Srs/2P95H  mutant  mice 
at  all  time  points  examined  (Figure  S1H). 

Given  that  macrocytic  anemia,  a  hallmark  of  anemia  in  MDS, 
was  present  in  Srs/2P95H  mutant  mice,  we  next  performed  cyto- 
logical  examination  of  peripheral  blood  and  bone  marrow 
smears  from  Mxl-cre  Srsf2  WT,  Mxl-cre  Srs/2fl/fl,  and  Mxl- 
cre  Srs/2P95H/WT  mice  to  assess  for  morphologic  dysplasia. 
This  revealed  prominent  myeloid  and  erythroid  dysplasia  in 
Srs/2P95H  mice  but  not  in  Mxl-cre  Srsf2  WT  or  Mxl-cre 
Srsf2f\/f\  mice  (Figure  1G;  Figure  S1I).  Myeloid  dysplasia  was 
apparent  based  on  detection  of  hypolobated  and  hypogranu- 
lated  neutrophils,  whereas  erythroid  dysplasia  was  evident 
based  on  nuclear  irregularities  and  cytoplasmic  vacuolization 
and  blebbing  in  erythroid  precursors.  Overall,  these  results  indi¬ 
cate  that  mutations  in  Srs/2P95H  result  in  morphologic  dysplasia 
and  cytopenias  with  preserved  marrow  cellularity,  features  that 
are  characteristic  of  human  MDS,  whereas  complete  loss  of 
Srsf2  is  incompatible  with  hematopoiesis. 
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Figure  1.  Conditional  Expression  of  Srsf2P95H  Results  in  Myeloid  Dysplasia,  a  Phenotype  Distinct  from  Heterozygous  or  Homozygous  Loss 
of  Srsf2 

(A)  Depiction  of  the  Srs/2P95H  allele. 

(B)  RNA-seq  of  LSK  cells  in  Mxl-cre  Srsf2WT  and  Mxl-cre  Srs/2P95H/WT  mice. 

(C-E)  White  blood  cell  (WBC)  count  (C),  hemoglobin  (Hb)  (D),  and  MCV  (E)  of  red  blood  cells  of  CD45.1  recipient  mice  18  weeks  following  noncompetitive 
transplantation  of  bone  marrow  from  CD45.2+  Mxl-cre  Srsf2WT,  Mxl-cre  Srs/2fl/WT,  Mxl-cre  Srsf2i\/i\,  and  Mxl-cre  Srs/2P95H/WT  mice  (n  =  1 0  mice/genotype 
for  all  genotypes  except  Mxl-cre  Srs/2fl/WT,  where  n  =  5;  pIpC  was  administered  to  recipient  mice  4  weeks  following  transplantation). 

(F  and  G)  H&E  staining  of  femurs  (scale  bars,  50  p,m)  (F)  and  peripheral  blood  smears  (G)  from  Mxl-cre  Srsf2\NJ,  Mxl-cre  Srsf2i\/i\,  or  Mxl-cre  Srs/2P95H/WT 
mice  (scale  bars,  10  |xm).  A  representative  neutrophil  (left)  and  erythroid  precursor  (right)  is  shown  for  Srsf2  WT  and  KO  mice.  Mxl-cre  Srs/2P95H  cells  were 
marked  by  hypolobated  and  hypogranulated  neutrophils  (left  two  photos)  and  nuclear  irregularities  as  well  as  cytoplasmic  vacuolization  and  blebbing  of  erythroid 
precursors  (right  two  photos). 

Error  bars  represent  mean  ±  SD.  ***p  <  0.001 ;  ****p  <  0.0001 .  See  also  Figure  SI . 


Given  that  mutations  in  SRSF2  occur  as  early  genetic  events  in 
MDS  pathogenesis  (Papaemmanuil  et  al.,  2013)  and  that  MDS  is 
characterized  by  expansion  of  HSPCs,  we  next  examined  HSPC 
numbers  and  function  in  Srs/2P95H  mice.  Analysis  of  CD45.2+ 


HSPC  subsets  from  Mxl-cre  Srs/2P95H/WT  mice  and  littermate 
controls  14  weeks  after  pIpC  injection  revealed  expansion  of 
lineage-negative  Sca1+  c-Kit+  (LSK)  and  restricted  hematopoi¬ 
etic  progenitor  cells  (LSK  CD48+  CD150+;  hematopoietic 
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Figure  2.  Conditional  Expression  of  Srsf2P95H  Results  in  Expansion  of  Hematopoietic  Stem  and  Progenitor  Cells  with  Increased  Cell 
Proliferation  and  Apoptosis 

(A  and  B)  Enumeration  (A)  and  fluorescence-activated  cell  sorting  (FACS)  analysis  (B)  of  BM  LSK  cells,  long-term  hematopoietic  stem  cells  (LT-HSC),  restricted 
hematopoietic  progenitor  cell  fractions  1  (HPC-1)  and  2  (HPC-2),  and  multipotent  progenitor  (MPP)  cells  (Oguroetal.,  2013)  in  12-week-old  Mx1-creSrsf2  WT  and 
Mxl-cre  S/-S/2P95H/WT  mice  (n  =  5  mice/genotype). 

(C)  Cell  cycle  analysis  of  LSK  cells  from  Mxl-cre  Srsf2\NJ  or  Mxl-cre  S/-S/2P95H/WT  mice  with  in  vivo  bromodeoxyuridine  (BrdU)  administration.  A  representative 
FACS  plot  analysis  shows  gating  on  LSK  cells  followed  by  BrdU  versus  4’,6-diamidino-2-phenylindole  (DAPI)  stain  (left). 

(D)  Relative  quantification  of  the  percentage  of  LSK  cells  in  S,  G2M,  and  G1  phase  is  shown  on  the  right  (n  =  8  mice  per  group). 

(E)  Relative  quantification  of  the  percentage  of  Annexin  V+/DAPI-  LSK  cells  (n  =  8  mice/genotype).  C,  control;  Kl,  knockin. 

Error  bars  represent  mean  ±  SD.  *p  <  0.05,  **p  <  0.01 ,  ****p  <  0.0001 .  See  also  Figure  S2. 


progenitor  cell  fraction  2  [HPC-2];  Oguro  et  al.,  2013)  in  mutant 
mice  relative  to  controls  (Figures  2A  and  2B).  A  similar  LSK 
expansion  was  seen  in  spleens  of  Srs/2P95H  mutant  mice 
(although  splenomegaly  was  not  observed  up  to  20  weeks 
post-pIpC)  (Figures  S2A  and  S2B).  Because  the  detection  of 
increased  HSPCs  in  Srs/2P95H  mutant  mice  appeared  paradox¬ 
ical  given  the  decreased  peripheral  blood  counts  in  these  same 
mice,  we  next  examined  the  cell  cycle  kinetics  and  apoptosis  of 
Srsf2  mutant  HSPCs.  Indeed,  Srs/2P95H  LSK  cells  were  charac¬ 
terized  by  an  increase  in  the  proportion  of  cells  in  S-phase  as  well 
as  in  early  apoptosis  (Figures  2C-2E).  Despite  HSPC  expansion 
in  Srs/2P95H  mutant  mice,  purified  LSK  cells  from  mice  with  a 
homozygous  Srsf2  deletion  or  heterozygous  Srs/2P95H  muta¬ 
tion  had  similarly  impaired  colony  formation  and  serial  re-plating 
capacity  in  vitro  (Figure  S2C). 


To  assess  the  functional  effects  of  Srsf2  alterations  on  HSC 
self-renewal  in  vivo,  we  next  compared  Srsf2  heterozygous  KO, 
homozygous  KO,  and  heterozygous  P95H  mutant  mice  in 
competitive  transplantation  assays  (Figure  3A).  Equal  numbers 
of  BM  MNCs  from  CD45.1  WT  mice  and  CD45.2  Mxl-cre  Srsf2 
WT,  Mxl-cre  Srs/2fl/WT,  Mxl-cre  Srsf2fl/fl,  or  Mxl-cre 
Srs/2P95H/WT  mice  were  transplanted  into  lethally  irradiated 
CD45.1  mice,  followed  by  pIpC  injection  4  weeks  later.  An 
assessment  of  peripheral  blood  chimerism  monthly  thereafter  re¬ 
vealed  a  complete  loss  of  CD45.2  chimerism  in  mice  transplanted 
with  Mxl-cre  Srs/2fl/fl  cells  and  a  significant  decrease  in  chime¬ 
rism  in  mice  transplanted  with  Mxl-cre  Srs/2P95H/WT  cells  (Fig¬ 
ure  3B;  Figures  S3A  and  S3B).  However,  an  analysis  of  BM  LSK 
chimerism  18  weeks  post-transplant  revealed  an  increase  in 
CD45.2+  HSPCs  derived  from  Srs/2P95H  mice  relative  to  other 
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Figure  3.  Srs/2P95H  Mutation  Impairs  Hematopoietic  Stem  Cell  Self-Renewal  in  a  Manner  Distinct  from  Srsf2  Loss 

(A)  Depiction  of  a  competitive  BM  transplantation  assay.  pIpC,  polyinosinic-polycytidylic  acid. 

(B)  Percentage  of  CD45.2+  chimerism  in  the  peripheral  blood  of  recipient  mice  (n  =  10  mice/genotype). 

(C  and  D)  Chimerism  (C)  and  flow  cytometric  enumeration  (D)  of  CD45.2+  LSK  (left)  and  MP  (lineage-negative  Sca1-c-Kit+,  right)  cells  in  the  BM  of  Mxl-cre 
Srsf2\NJ,  Mxl-cre  S/-s/2fl/WT,  Mxl-cre  Srs/2fl/fl,  and  Mxl-cre  Srs/2P95H/WT  mice  14  weeks  after  pIpC  injection. 

Error  bars  represent  mean  ±  SD.  **p  <  0.001 ,  ***p  <  0.0002,  ****p  <  0.0001 .  See  also  Figure  S3. 


groups  and  a  near  complete  absence  of  CD45.2+  HSPCs  from 
Srsf2M\  mice  (Figures  3C  and  3D;  Figure  S3C).  Serial  competitive 
transplantation  of  whole  bone  marrow  from  Srs/2P95H,  Srsf2  het¬ 
erozygous  KO,  and  Srsf2  WT  primary  recipient  transplanted  mice 
continued  to  reveal  an  impaired  reconstitution  capacity  of 
Srs/2P95H  mutant  mice  relative  to  Srsf2  heterozygous  KO  or  con¬ 
trol  mice  (Figure  S3D).  Of  note,  colony  assays  and  competitive 
transplantation  experiments  were  performed  using  multiple  ge¬ 
notypes  of  control  mice  (Cre-negative  Srs/2P95H  mice  as  well 
as  Mxl-cre  Srsf2  WT  mice;  Figures  S2C  and  S3E)  to  control  for 
any  possible  confounding  effect  of  Ore  expression  or  the  pres¬ 
ence  of  the  unexcised  P95H  knockin  allele. 

The  fact  that  Mxl-cre  Srs/2P95H/WT  mice  had  an  increase  in 
HSPCs  despite  impaired  formation  of  mature  peripheral  blood 
cells  suggested  that  mutant  Srsf2  was  associated  with  impaired 
HSPC  differentiation.  Flow  cytometric  analysis  of  mature  and  in¬ 


termediate  precursor  cell  subsets  in  Srs/2P95H  mice  was  there¬ 
fore  performed  to  identify  the  stage  of  impaired  hematopoiesis. 
This  revealed  that  peripheral  leukopenia  was  predominantly 
due  to  decreased  peripheral  blood  B  cells,  evident  at  all  stages 
of  B  lymphopoiesis  following  the  transition  of  pre-proB  to  proB 
cells,  in  Srs/2P95H  mice  relative  to  controls  (Figures  S3F  and 
S3G).  Moreover,  immunophenotypic  analysis  of  intermediate  he¬ 
matopoietic  progenitors  (Pronk  et  al. ,  2007)  revealed  deficits  in 
early  erythroid  progenitors  in  Srs/2P95H  mice  relative  to  con¬ 
trols,  initiating  at  the  pre-MegE  and  pre-colony-forming  units, 
erythroid,  stages  (Figures  S3H  and  S3I).  Given  prior  data 
showing  that  homozygous  deletion  of  Srsf2  resulted  in  defective 
T  cell  maturation  and  CD45  splicing  (Wang  et  al.,  2001),  we  also 
examined  thymic  T  cell  differentiation  and  CD45  isoform  expres¬ 
sion  in  Srs/2P95H  mice  relative  to  controls  (Figures  S3J  and 
S3K).  This  revealed  no  effect  of  Srs/2P95H  mutation  on  thymic 
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T  cell  maturation  or  protein  expression  of  the  specific  CD45  iso¬ 
forms  identified  previously  to  be  downregulated  with  homozy¬ 
gous  deletion  of  Srsf2  (Wang  et  al.,  2001). 

Collectively,  the  biological  analysis  of  Srs/2P95H  mutant  mice 
identified  phenotypes  distinct  from  mice  with  a  partial  or  com¬ 
plete  loss  of  Srsf2,  suggesting  that  SRSF2  mutations  alter 
SRSF2’s  normal  function  rather  than  resulting  in  haploinsuffi- 
ciency  or  a  dominant-negative  function.  Of  note,  despite  the 
impaired  hematopoietic  differentiation,  increase  in  HSPC  sub¬ 
sets,  and  morphologic  dysplasia  in  Srs/2P95H/WT  mice,  no 
Srs/2P95H  mutant  mice  developed  acute  myeloid  leukemia  in 
up  to  70  weeks  of  observation. 

SRSF2  Mutations  Are  Associated  with  Global  Alterations 
of  Gene  Expression  and  Splicing 

We  next  sought  to  identify  the  transcriptional  and  post-transcrip¬ 
tional  alterations  caused  by  SRSF2  mutations  through  RNA-seq 
of  purified  LSK  and  myeloid  progenitor  (MP,  lineage-negative 
Seal  -  c-Kit+)  populations.  This  was  performed  4  weeks  after 
pIpC  administration.  In  an  unsupervised  cluster  analysis  based 
on  coding  gene  expression,  samples  clustered  first  by  cell  type 
and  then  by  genotype  (Figure  S4A).  The  expression  of  several  he¬ 
matopoietic  regulators  was  altered  in  Srs/2P95H  mutant  cells, 
including  upregulation  of  Gfil ,  Cebpe,  and  Floxb2  in  LSK  cells; 
downregulation  of  Gatal  and  Gata2  in  MP  cells;  and  downregu- 
lation  of  Cdknla  in  both  populations.  In  addition,  we  observed 
preferential  down-  versus  upregulation  of  the  expression  of  cod¬ 
ing  genes  in  Srsf2  mutant  cells  relative  to  the  WT  (Figures  S4B 
and  S4C).  Gene  ontology  (GO)  analysis  revealed  an  enrichment 
for  the  downregulation  of  genes  in  both  LSK  and  MP  cells 
involved  in  the  regulation  of  cell  cycle,  proliferation,  differentia¬ 
tion,  and  apoptosis  (upregulated  genes  were  not  enriched  for 
these  processes;  Figure  S4D). 

To  identify  changes  in  splicing  driven  by  SRSF2  mutations  that 
might  contribute  to  disease,  we  augmented  our  mouse  data  with 
RNA-seq  data  from  primary  CMML  (n  =  13;  3  with  SRSF2  muta¬ 
tion)  and  AML  (n  =  9,  5  with  SRSF2  mutation)  patient  samples 
(Table  SI)  as  well  as  K562  cells  ectopically  expressing  an  empty 
vector  or  a  single  allele  of  SF?SF2  (WT,  P95FI,  P95L,  and  P95R).  In 
all  sequenced  patients  with  SRSF2  mutations,  the  WT  and 
mutant  alleles  were  expressed  at  similar  levels  (Table  SI),  as 
was  the  case  for  the  Srs/2P95H  mouse  cells  (Figure  1 ).  Similarly, 


isogenic  K562  cells  with  lentiviral  expression  of  WT  or  mutant 
SRSF2  cells  expressed  WT  and  mutant  SRSF2  at  roughly  equal 
levels  (Figures  S4E-S4G).  We  quantified  global  changes  in 
splicing  of  ~1 25,000  alternative  splicing  events  and  ~1 60,000 
constitutive  splice  junctions  associated  with  SRSF2  mutations 
in  these  five  datasets  (LSK,  MP,  CMML,  AML,  and  K562).  We 
required  a  minimum  change  in  isoform  ratio  of  10%  to  call  an 
event  differentially  spliced  (where  a  change  in  isoform  ratio  is 
defined  as  an  absolute,  rather  than  relative,  quantity  as  the  in¬ 
crease  or  decrease  in  the  percentage  of  all  mRNAs  transcribed 
from  the  parent  gene  that  follow  a  given  splicing  pattern).  In  all 
datasets,  SRSF2  mutations  were  associated  with  differential 
splicing  of  all  classes  of  splicing  events  as  well  as  novel  alterna¬ 
tive  splicing  and  intron  retention  of  splice  junctions  annotated  as 
constitutively  spliced.  However,  only  a  relatively  small  fraction  of 
alternatively  spliced  events  of  any  class  were  affected  by  SRSF2 
mutations  (Figure  S4H).  SRSF2  mutations  were  associated  with 
a  mild  bias  toward  exon  skipping  but  did  not  lead  to  globally 
increased  levels  of  predicted  substrates  for  degradation  by 
nonsense-mediated  decay. 

SRSF2  Mutations  Alter  Exonic  Splicing  Enhancer 
Preference  but  SRSF2  Loss  Does  Not 

Because  SRSF2  normally  recognizes  ESE  elements  within  the 
pre-mRNA  to  promote  exon  recognition  (Graveley  and  Maniatis, 
1998;  Liu  et  al.,  2000;  Schaal  and  Maniatis,  1999;  Zahler  et  al., 

2004),  we  hypothesized  that  SRSF2  mutations  might  alter  its 
normal  sequence-specific  activity.  To  test  this,  we  performed 
an  ab  initio  motif  identification  screen.  We  quantified  the  occur¬ 
rence  of  each  possible  k-mer  (k  =  4,  5,  6)  within  cassette  exons 
that  were  differentially  spliced  in  Srs/2P95H  MP  cells  and  identi¬ 
fied  k-mers  that  were  enriched  or  depleted  in  cassette  exons 
promoted  versus  repressed  in  Srs/2P95H  cells.  We  identified  en¬ 
riched  and  depleted  motifs  using  a  non-parametric  (Kolmo- 
gorov-Smirnov)  statistical  test  with  a  p  value  threshold  of  0.05. 
Significantly  enriched  k-mers  were  C-rich,  whereas  depleted 
k-mers  were  G-rich  (Figures  S4I  and  S4J).  We  then  performed 
an  identical  analysis  using  our  K562  data,  which  likewise  identi¬ 
fied  CCAG  and  GGTG  as  the  most  enriched  and  depleted 
consensus  motifs,  respectively  (Figures  4A  and  4B).  A  recent  so¬ 
lution  structure  of  SRSF2  in  complex  with  RNA  revealed  that 
SRSF2  has  a  consensus  motif  of  SSNG  (where  “S”  represents 


Figure  4.  SRSF2  Mutations  Alter  Exonic  Splicing  Enhancer  Preference 

(A)  Scatterplot  of  cassette  exon  inclusion  in  K562  cells  expressing  empty  vector  or  SFSF2P95R.  Percentages  indicate  the  percent  of  alternatively  spliced  cassette 
exons  with  increased  or  decreased  inclusion.  Red  and  blue  dots  represent  individual  cassette  exons  that  are  promoted  or  repressed  in  SFSF2P95R  versus  empty 
vector  cells,  respectively.  Promoted  and  repressed  cassette  exons  are  defined  as  those  whose  inclusion  levels  are  increased  or  decreased  by  >10%  with  a 
Bayes  factor  of  >5,  as  estimated  by  Wagenmakers’  framework  (Wagenmakers  et  al.,  2010). 

(B)  Enriched  (right)  and  depleted  (left)  k-mers  in  cassette  exons  promoted  versus  repressed  in  SFSF2P95R  versus  WT  cells. 

(C)  Scatterplot  of  cassette  exon  inclusion  in  TF-1  cells  following  transfection  with  a  siRNA  against  SRSF2  or  a  control  non-targeting  siRNA  (KD,  knockdown). 
Percentages  indicate  the  percent  of  alternatively  spliced  cassette  exons  with  increased  or  decreased  inclusion. 

(D)  Enriched  (right)  and  depleted  (left)  k-mers  in  cassette  exons  promoted  versus  repressed  in  SRSF2  KD  versus  control  cells. 

(E)  Mean  enrichment  of  all  variants  of  the  SSNG  motif  in  cassette  exons  promoted  versus  repressed  in  TF-1  cells  following  SRSF2  knockdown  and  K562,  LSK,  and 
MP  cells  expressing  WT  or  mutant  SRSF2.  Error  bars  indicate  95%  confidence  intervals  estimated  by  bootstrapping. 

(F)  Relative  frequency  of  CCNG  and  GGNG  motifs  in  cassette  exons  promoted  versus  repressed  by  SRSF2  mutations  in  LSK  and  MP  cells  (top),  K562  cells  (left), 
and  primary  AML  and  CMML  samples  with  or  without  SRSF2  mutations  (right)  (the  sample  numbers  correspond  to  the  patient  identifiers  in  Table  SI).  Shading 
indicates  95%  confidence  interval  by  bootstrapping.  The  schematic  illustrates  a  portion  of  a  metagene  containing  the  differentially  spliced  cassette  exon.  From 
left  to  right,  the  features  are  the  upstream  exon  (gray  box)  and  intron  (black  line),  the  cassette  exon  (black  box,  vertical  dashed  lines),  and  the  downstream  intron 
(black  line)  and  exon  (gray  box).  Horizontal  axis,  genomic  coordinates  defined  with  respect  to  the  5'  and  3'  splice  sites  where  0  is  the  splice  site  itself.  Vertical  axis, 
relative  frequency  of  the  indicated  motifs  over  genomic  loci  containing  cassette  exons  promoted  versus  repressed  by  SRSF2  mutations  (log  scale). 

See  also  Figure  S4. 


Cancer  Cell  27,  61 7-630,  May  1 1 , 201 5  ©201 5  Elsevier  Inc.  623 


CelPress 


C  or  G)  and  efficiently  recognizes  both  CCNG  and  GGNG  (Daub- 
ner  et  al.,  2012).  Therefore,  our  ab  initio  analysis  suggested  that 
mutations  affecting  the  P95  residue  may  alter  SRSF2’s  ability  to 
recognize  variants  of  its  normal  SSNG  motif. 

To  further  explore  this  hypothesis,  we  compared  the  relative 
enrichment  of  all  four  SSNG  variants  in  cassette  exons  that 
were  differentially  spliced  upon  depletion  of  SRSF2,  overexpres¬ 
sion  of  WT  SRSF2,  or  expression  of  mutant  SRSF2.  SRSF2 
depletion— achieved  by  knockdown  of  endogenous  SRSF2  in 
the  absence  of  mutant  protein  expression  (Figure  S4K)— caused 
preferential  skipping  of  cassette  exons,  consistent  with  SRSF2’s 
canonical  role  in  promoting  exon  recognition  (Figure  4C).  Ab  ini¬ 
tio  motif  analyses  identified  both  C-  and  G-rich  variants  of  the 
SSNG  motif  as  the  most  enriched  motifs  in  cassette  exons  that 
were  repressed  following  SRSF2  depletion  (Figure  4D).  Quantita¬ 
tion  of  the  enrichment  of  each  SSNG  variant  revealed  that  all 
were  associated  with  exon  repression  following  knockdown.  In 
contrast,  overexpression  of  WT  SRSF2  was  associated  with 
enrichment  of  each  SSNG  variant  (Figure  4E).  These  data  sug¬ 
gest  that  different  SSNG  variants  function  as  equally  efficacious 
SRSF2-dependent  ESEs,  consistent  with  SRSF2’s  in  vitro  bind¬ 
ing  specificity  (Daubner  et  al.,  2012).  In  contrast,  K562  cells  as 
well  as  LSK  and  MP  cells  expressing  mutant  Srsf2  exhibited 
enrichment  for  CCNG  and  depletion  for  GGNG  in  exons  that 
were  promoted  versus  repressed  (Figure  4E). 

To  test  whether  this  motif  enrichment  and  depletion  was  due 
to  ESE  activity,  we  computed  the  spatial  distribution  of  CCNG 
and  GGNG  motifs  across  genomic  loci  containing  cassette 
exons  that  were  promoted  or  repressed  in  association  with 
SRSF2  mutations.  CCNG  and  GGNG  were,  respectively,  en¬ 
riched  and  depleted  specifically  over  cassette  exons  and  not 
over  the  flanking  introns  or  exons.  We  observed  similar  motif 
preferences  and  distributions  in  patient  transcriptomes  (Fig¬ 
ure  4F).  Because  CCNG/GGNG  motifs  were  not  consistently 
enriched/depleted  in  introns  flanking  differentially  spliced 
cassette  exons,  and  because  we  were  unable  to  identify  en¬ 
riched  motifs  with  ab  initio  searches  in  introns,  we  conclude 
that  differential  cassette  exon  splicing  is  likely  due  primarily  to 
altered  recognition  of  exonic  motifs.  Together,  these  data  reveal 
spatially  restricted  enrichment  of  specific  ESEs  in  association 
with  SRSF2  mutations  and  suggest  that  SRSF2  mutations 
cause  alteration  rather  than  loss  of  normal  ESE  recognition 
activity. 

SRSF2  Proline  95  Mutations  Alter  RNA  Binding 
Specificity  by  Changing  the  Conformation  of  Both  RRM 
Termini 

We  next  tested  whether  this  association  between  SRSF2  muta¬ 
tions  and  enrichment/depletion  of  specific  ESEs  was  due  to 
altered  SRSF2:RNA  interactions.  We  purified  SRSF2’s  RNA 
RRM  as  described  previously  and  performed  isothermal  titration 
calorimetry  (ITC)  with  the  RNA  ligand  5'-uCCAGu-3',  an  optimal 
SRSF2  target  according  to  the  SSNG  consensus  sequence 
(Daubner  et  al.,  2012).  All  three  P95  mutations  resulted  in  an  in¬ 
crease  in  binding  affinity  of  3.9-  to  4.5-fold  relative  to  WT  SRSF2 
(Figures  5A  and  5B;  Figure  S5A),  consistent  with  the  enrichment 
for  CCNG  motifs  that  we  observed  in  exons  promoted  by  SRSF2 
mutations  (Figure  4B).  We  next  tested  whether  P95  mutations 
resulted  in  altered  RNA  binding  specificity.  In  contrast  to 


5'-uCCAGu-3'  RNA,  ITC  measurements  revealed  that  all  three 
P95  mutants  exhibited  a  1 .2-  to  2.1  -fold  decrease  in  binding  af¬ 
finity  to  the  5'-uGGAGu-3'  RNA  relative  to  WT  SRSF2  (Figures  5A 
and  5B;  Figure  S5B).  ITC  measurements  using  the  RNA  se¬ 
quences  5'-uGCAGu-3'  and  5'-uCGAGu-3'  revealed  that  G  >  C 
substitutions  at  the  second  motif  position  resulted  in  larger  in¬ 
creases  in  binding  affinity  than  at  the  first  motif  position  (2.6-  to 
3.4-fold  versus  1.1-  to  1.8-fold;  Figure  5B;  Figures  S5C  and 
S5D).  The  RNA  binding  preferences  measured  by  ITC  were 
remarkably  consistent  with  the  ESE  enrichment  identified  by 
RNA-seq.  For  each  mutant,  the  level  of  motif  enrichment  (Fig¬ 
ure  4E)  was  roughly  proportional  to  the  affinity  increase  (Fig¬ 
ure  5C),  and  the  enrichment  and  affinity  measurement  supported 
the  same  relative  preference  for  each  specific  motif  (CC  >  GC  > 
CG  >  GG).  This  strongly  supports  the  notion  that  the  splicing 
changes  caused  by  P95  mutations  are  the  result  of  an  altered 
sequence  specificity  of  the  SRSF2  RRM. 

P95  is  located  at  the  C-terminal  end  of  the  SRSF2  RRM,  and 
the  published  solution  structure  of  SRSF2  in  complex  with  5'-uC- 
CAGu-3'  revealed  extensive  contacts  of  P95  with  the  second 
cytosine  (Figure  S5E),  emphasized  by  several  intermolecular  nu¬ 
clear  Overhauser  effects  (NOEs)  (Daubner  et  al.,  2012).  To  test 
whether  SRSF2’s  RNA  binding  surface  was  altered  by  P95  mu¬ 
tations,  we  conducted  nuclear  magnetic  resonance  (NMR)  titra¬ 
tion  with  the  SRSF2  P95H  RRM  and  the  5'-uCCAGu-3'  RNA  and 
assigned  the  backbone  of  this  complex  using  standard  hetero- 
nuclear  NMR  experiments.  Mapping  of  the  chemical  shift  pertur¬ 
bations  revealed  that  the  RNA-binding  surface  of  the  RRM  is  not 
disturbed  by  the  P95H  mutation.  However,  both  termini  experi¬ 
enced  large  changes  in  their  environment  (Figure  5D),  an  obser¬ 
vation  that  held  true  for  all  three  P95  mutations  (Figure  S5F). 
Consistent  with  our  ESE  and  ITC  analyses,  this  relocation  of 
termini  primarily  affected  the  second  cytosine,  which  exhibited 
the  largest  chemical  shift  perturbations  of  its  proton  resonances 
(Figures  S5G  and  S5H).  Smaller  changes  of  chemical  shifts  were 
observed  when  P95  mutants  were  bound  to  5'-uGGAGu-3'  (Fig¬ 
ure  S5H).  Together,  our  experiments  indicate  that  SRSF2  muta¬ 
tions  change  SRSF2’s  normal  RNA-binding  affinity  and  speci¬ 
ficity  in  vitro,  likely  explaining  the  widespread  alterations  in 
ESE  preference  we  observed  in  vivo. 

Mutant  SRSF2  Promotes  Mis-splicing  and  Degradation 
of  EZH2 

We  next  used  our  transcriptome  data  to  identify  common 
changes  in  splicing  driven  by  SRSF2  mutations  that  might 
contribute  to  disease.  Intersection  of  differentially  spliced  genes 
in  LSK,  MP,  CMML,  and  AML  samples  identified  75  genes  differ¬ 
entially  spliced  in  association  with  SRSF2  mutations  in  both  LSK 
and  MP  cells  and  at  least  one  primary  patient  cohort  as  well  as  an 
additional  97  (LSK)  and  87  (MP)  genes  differentially  spliced  in  one 
mouse  cell  population,  but  not  the  other,  as  well  as  a  patient 
cohort  (Figure  6A;  Tables  S2-S5).  Many  of  these  genes  have  a 
known  importance  in  myeloid  malignancies.  For  example, 
SRSF2  mutations  promoted  the  inclusion  of  a  highly  conserved 
“poison”  cassette  exon  of  EZFi2  (Enhancer  of  zeste  homo  log 
2)  and  repressed  a  frame-preserving  cassette  exon  of  BCOR 
(BCL6  corepressor)  (Figure  6A;  Figures  S6A  and  S6B).  Of  note, 
we  did  not  identify  altered  splicing  of  CD45  in  SRSF2  mutant 
cells  (Tables  S2-S5),  which  has  been  noted  previously  as  being 
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Figure  5.  Proline  95  Mutations  Change  the  RNA-Binding  Specificity  of  the  SRSF2  RNA  RRM  In  Vitro  and  Lead  to  Relocation  of  the  N  and  C 
Termini 

(A)  ITC  raw  data  and  binding  curve  for  the  SRSF2  RRM  P95H  mutant  with  S'-uCCAGu-S'  and  S'-uGGAGu-S'  RNA. 

(B)  Change  in  RNA-binding  affinity  (percent)  for  SRSF2  RRM  P95FI  (blue),  P95L  (green),  and  P95R  (black)  mutants  compared  with  WT  (red)  (Daubner  et  al.,  201 2) 
using  RNA  targets  S'-uCCAGu-S',  S'-uGCAGu-S',  S'-uCGAGu-S7,  and  S'-uGGAGu-S7. 

(C)  Change  in  RNA-binding  specificity  of  SRSF2  RRM  WT,  P95H,  P95L,  and  P95R  with  57-UCCAGU-37  (blue),  57-UGCAGU-37  (dark  gray),  57-UCGAGU-37  (light 
gray),  and  57-UGGAGU-37  RNA  (orange).  Error  bars  represent  mean  ±  SD. 

(D)  Left:  overlay  of  2D  [15N-1H]  heteronuclear  single  quantum  coherence  (HSQC)  of  the  wild-type  (red)  and  P95H  mutant  (blue)  bound  to  57-UCCAGU-37  RNA,  with 
negative  peaks  shown  in  green  (WT)  and  light  green  (mutant).  Right:  difference  of  the  chemical  shift  perturbations  of  the  P95H  mutant  and  wild-type.  Positive 
values  (blue)  with  a  higher  perturbation  with  the  P95H  mutant  and  negative  values  (red)  with  a  higher  perturbation  with  the  WT  are  shown.  Missing  assignments  are 
marked  with  gray  bars  and  proline  with  a  gray  P.  Residues  with  the  highest  difference  are  depicted  in  both  the  graph  and  spectra. 

See  also  Figure  S5. 


altered  in  murine  Srsf2  KO  hematopoietic  cells  (Wang  et  al., 
2001). 

To  identify  potential  functional  consequences  of  recurrent 
mis-splicing,  we  focused  on  the  splicing  event  in  EZH2. 
SRSF2  mutant  cells  exhibited  preferential  inclusion  of  a  poison 
cassette  exon  that  introduces  a  premature  termination  codon 
predicted  to  result  in  nonsense-mediated  decay  (NMD)  of 
EZH2  (Figures  6B  and  6C).  Both  the  poison  exon  itself  and  its 
flanking  intronic  sequences  exhibited  high  sequence  conserva¬ 
tion  across  vertebrates,  exceeding  the  sequence  conservation 
exhibited  by  the  upstream  and  downstream  constitutive  coding 
exons  themselves,  which  is  a  common  feature  of  physiologi¬ 
cally  important  splicing  events  (Lareau  et  al.,  2007;  Ni  et  al., 
2007;  Figure  6B). 


We  validated  this  EZH2  splicing  change  using  both  qualitative 
and  quantitative  isoform-specific  RT-PCR  in  leukemia  cell  lines 
that  were  WT  or  mutant  for  SRSF2  (Figures  S6C  and  S6D)  as 
well  as  in  an  independent  panel  of  primary  AML  patient  samples 
with  or  without  SRSF2  mutations  (n  =  8, 4  with  SRSF2  mutations; 
Figure  6D;  Figure  S6E). 

Next,  to  confirm  whether  the  cassette  exon  promoted  by 
SRSF2  mutations  triggers  degradation  by  NMD,  we  measured 
the  half-life  of  the  inclusion  isoform  of  EZH2  in  SRSF2P95H  cells 
transfected  with  a  control  or  anti-L/PF7  (a  required  NMD  factor) 
short  hairpin  RNA  (shRNA)  following  transcriptional  shutoff  with 
actinomycin  D  (’t  Hoen  et  al.,  201 1 ;  Figure  6E;  Figures  S6F  and 
S6G).  The  fact  that  the  mRNA  half-life  of  the  inclusion  isoform 
of  EZFI2  was  lengthened  by  UPF1  knockdown  in  these 
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experiments  suggests  that  this  particular  isoform  of  EZH2,  which 
is  promoted  by  mutant  SRSF2,  undergoes  NMD.  The  half-life  of  a 
well-characterized  NMD  substrate  of  SRSF3  (Lareau  et  al.,  2007; 
Ni  et  al.,  2007)  increased  similarly  following  UPF1  knockdown, 
confirming  that  UPF1  knockdown  effectively  inhibited  NMD 
(Figure  S6H). 

Next,  to  identify  whether  the  protein  product  of  EZH2  is  altered 
in  SRSF2  mutant  cells,  we  performed  WB  analysis  of  a  panel  of 
human  AML  cell  lines  WT  (TF-1 ,  K562)  or  mutant  for  SRSF2 
(K052)  (all  WT  for  EZH2).  This  revealed  lower  EZH2  protein  levels 
as  well  as  lower  global  levels  of  histone  H3  lysine  27  trimethyla- 
tion  (H3K27me3,  a  methylation  mark  placed  by  EZH2)  in  SRSF2 
mutant  K052  cells  (Figure  6F).  To  further  validate  this  finding  in  an 
isogenic  context,  we  performed  WB  analysis  in  K562  cells  ectop- 
ically  expressing  WT  SRSF2  or  SRSF2 P95FI/L/R  mutant  cDNA. 
This  analysis  revealed  a  consistent  downregulation  of  EZH2 
protein  expression  as  well  as  global  H3K27me3  in  all  three 
SRSF2  mutant  samples  compared  with  SRSF2  WT  K562  cells 
(Figure  6G). 

Consistent  with  SRSF2  mutations  promoting  a  disabling 
splicing  change  in  EZH2,  EZH2  loss-of-function  mutations  are 
common  in  MDS.  In  an  analysis  of  >1,800  MDS  patients  where 
EZH2  and  SRSF2  were  both  sequenced,  EZH2  loss-of-function 
mutations  were  mutually  exclusive  with  SRSF2  mutations  (p  < 
0.0001)  (Bejar  et  al.,  2012;  Ernst  et  al.,  2010;  Haferlach  et  al., 
2014;  Muto  et  al.,  2013;  Papaemmanuil  et  al.,  2013;  Figure  6H). 

The  above  data  strongly  link  SRSF2  mutations  to  disabling 
splicing  of  EZH2.  We  next  sought  to  examine  whether  the 
change  in  RNA  ESE  preference  induced  by  SRSF2  mutations 
caused  EZH2  mis-splicing.  We  therefore  cloned  the  genomic  lo¬ 
cus  containing  the  EZH2  poison  exon  and  flanking  introns  and 
constitutive  exons  to  create  a  minigene  that  recapitulates  this 
splicing  event.  We  identified  three  potential  SRSF2-dependent 
SSNG  motifs  in  the  poison  exon  (CCTG,  CCTG,  and  GCAG), 
one  or  more  of  which  we  expected  to  be  better  recognized  by 
mutant  SRSF2  than  WT  SRSF2.  We  then  mutated  each  motif 
to  the  corresponding  GG  equivalent,  both  separately  and  in 
combination  (Figure  61).  Measuring  cassette  exon  recognition 


in  K562  cells  expressing  WT  or  mutant  SRSF2,  we  found  that 
the  first  motif  was  required  for  robust  splicing  change  in 
SRSF2  mutant  cells,  such  that  the  mutation  CCTG  >  GGTG  pre¬ 
vented  an  increase  in  poison  exon  recognition  (Figure  6J).  We 
conclude  that  SRSF2  mutations  induce  a  disabling  splicing 
change  in  EZH2  in  an  ESE-dependent  manner,  consistent  with 
altered  RNA  recognition  activity. 

We  next  sought  to  test  whether  restoring  normally  spliced 
EZH2  mRNA  could  rescue  hematopoiesis  in  SRSF2  mutant 
cells.  EZH2  full-length  cDNA  or  an  empty  vector  (both  in  a 
retroviral  ZsGreenl  vector)  were  overexpressed  in  c-Kit+ 
Srs/2P95H  or  WT  cells,  followed  by  assessment  of  methylcellu- 
lose  colony  formation  of  c-Kit+/ZsGreen1+  cells.  EZH2  cDNA 
was  equally  overexpressed  in  Srsf2  mutant  and  WT  cells  (Fig¬ 
ure  S6I),  and  Srs/2P95H  mutant  cells  overexpressing  full-length 
EZH2  experienced  an  ~50%  increase  in  colony  formation  rela¬ 
tive  to  Srs/2P95H  mutant  cells  expressing  an  empty  vector  (Fig¬ 
ure  6K;  Figure  S6J).  In  contrast,  EZH2  overexpression  had  no 
substantial  effect  on  initial  colony  formation  in  Srsf2  WT  cells 
(Figure  6K;  Figure  S6I).  These  data  identify  that  restoration  of 
normally  spliced  EZH2  mRNA  in  SRSF2  mutant  cells  at  least 
partially  rescues  the  hematopoietic  defects  induced  by  mutant 
SRSF2. 

DISCUSSION 

The  consistent  occurrence  of  heterozygous  point  mutations 
affecting  highly  restricted  residues  of  spliceosomal  proteins 
strongly  suggests  a  gain-of-function  or  dominant-negative  activ¬ 
ity  for  these  mutations  in  malignant  transformation.  Here  we 
identify  an  effect  of  the  SRSF2P95H  mutation  distinct  from  loss 
of  SRSF2  and  reveal  that  mutations  in  SRSF2  confer  an  alteration 
in  function  that  results  in  key  aspects  of  MDS.  This  includes  an 
increase  in  HSPCs  in  Srs/2P95H  mutant  mice  with  impaired  dif¬ 
ferentiation,  altered  cell  cycle  kinetics,  and  increased  apoptosis 
resulting  in  peripheral  cytopenias  and  morphologic  dysplasia.  By 
contrast,  WT  Srsf2  appears  to  be  constitutively  required  for 
hematopoiesis. 


Figure  6.  SRSF2  Mutant  Primary  Murine  and  Patient  Samples  Exhibit  Convergent  Splicing  Alterations 

(A)  Intersection  of  genes  exhibiting  differential  splicing  in  SRSF2  mutant  versus  WT  mouse  LSK  and  MP  cells  and  primary  AML  and  CMML  samples  (restricted  to 
orthologous  genes). 

(B)  Integrative  Genomics  Viewer  (IGV)/Sashimi  plot  illustrating  the  EZH2  cassette  exon  promoted  by  SRSF2  mutations  in  multiple  datasets  analyzed  here  (top)  (the 
patient  numbers  listed  in  the  Sashimi  plot  correspond  to  the  numbers  in  Table  SI  detailing  patient  characteristics).  The  DNA  sequence  conservation  of  the  locus, 
as  estimated  by  phastCons  (Siepel  et  al.,  2005),  across  30  vertebrate  species  is  shown  in  the  track  below  the  Sashimi  plot. 

(C)  Bar  plot  describing  the  percentage  of  EZH2  transcripts  harboring  a  specific  cassette  exon  in  the  SRSF2  mutant  relative  to  WT  primary  AML  samples  from 
RNA-seq  data.  Error  bars  indicate  95%  confidence  intervals. 

(D)  RT-PCR  of  an  EZH2  exon  inclusion  event  in  an  independent  set  of  SRSF2  WT  and  mutant  AML  samples. 

(E)  Quantitative  RT-PCR  of  EZH2  inclusion  isoform  in  SFSF2P95H  mutant  cell  line  K052  cells  with  or  without  UPF1  knockdown  and  actinomycin  D  treatment. 

(F)  WB  analysis  for  EZH2  and  H3K27me3  in  SRSF2/EZH2  WT  (TF-1 ,  K562)  and  SFSF2P95H  mutant /EZH2  WT  (K052)  AML  cell  lines. 

(G)  WB  analysis  for  EZH2,  H3K27me3,  and  FLAG  epitope  in  K562  cells  with  lentiviral  overexpression  of  N-terminal  FLAG-tagged  SRSF2  WT,  SRSF2P95H, 
SRSF2P95L,  or  SRSF2P95R  (left).  Relative  quantification  of  EZH2  protein  expression  by  WB  to  total  histone  H3  expression  in  K562  cells  expressing  SRSF2 
mutants  relative  to  WT  is  shown  on  the  right. 

(H)  EZH2  and  SRSF2  mutations  are  mutually  exclusive  in  the  sequencing  of  DNA  from  >1 ,000  MDS  patients  (Bejar  et  al.,  2012;  Ernst  et  al.,  2010;  Haferlach  et  al., 
2014;  Muto  et  al.,  2013;  Papaemmanuil  et  al.,  2013). 

(I)  Schematic  of  the  EZH2  cassette  exon  with  SSNG  motifs  highlighted  and  mutations  to  GG  equivalents  shown. 

(J)  EZH2  cassette  exon  inclusion  for  minigenes  containing  the  endogenous  cassette  exon  or  a  cassette  exon  with  mutation  of  motifs  1 , 2,  and/or  3  to  the  GG 
equivalent. 

(K)  Photographs  (left)  and  enumeration  (right)  of  c-Kit+/ZsGreen1  +  cells  from  Srsf2  WT  or  Srs/2P95H  mice  1 4  days  after  overexpression  of  empty  vector  or  EZH2 
cDNA  and  plating  in  methylcellulose  medium. 

**p  <  0.01 ,  ****p  <  0.0001 .  Error  bars  represent  mean  ±  SD  unless  stated  otherwise.  See  also  Figure  S6  and  Tables  S1-S5. 
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Transcriptional  analysis  of  SRSF2  mutant  cells  revealed  that 
SRSF2  mutations  result  in  genome-wide  alterations  in  ESE  pref¬ 
erence  in  both  human  and  murine  cells.  Biochemical  analysis  of 
the  interaction  of  SRSF2  with  RNA  in  cell-free  in  vitro  assays 
identified  an  analogous  change  in  specificity  of  interactions  be¬ 
tween  SRSF2  and  pre-mRNA  induced  by  SRSF2  mutations. 
This  altered  interaction  of  mutant  SRSF2  with  RNA  appears  to 
be  due  to  an  effect  of  SRSF2 P95H/L/R  mutations  on  the  confor¬ 
mations  of  the  termini  of  SRSF2’s  RRM  domain,  as  revealed  by 
NMR  spectroscopy.  Our  genomic  and  biochemical  assays  indi¬ 
cate  that  SRSF2  mutations  cause  alteration  rather  than  loss-of- 
function,  driving  preferential  recognition  of  cassette  exons 
containing  C-  versus  G-rich  ESEs. 

The  altered  pre-mRNA  recognition  activity  of  mutant  SRSF2 
likely  underlies  the  mis-splicing  of  key  transcriptional  regula¬ 
tors— several  of  which  have  been  implicated  previously  in 
MDS  pathogenesis.  This  includes  promotion  of  a  poison  exon 
of  EZH2  that  undergoes  NMD  and  results  in  reduced  EZH2  pro¬ 
tein  expression  in  SRSF2  mutant  cells.  Loss-of-function  muta¬ 
tions  in  EZH2  occur  in  the  same  exact  spectrum  of  myeloid 
malignancies  as  SRSF2  mutations  (Ernst  et  al.,  2010;  Nikoloski 
et  al.,  2010)  and  loss  of  Ezh2  has  been  functionally  linked  to 
MDS  development  in  vivo  (Muto  et  al.,  2013).  Moreover, 
SRSF2  and  EZH2  mutations  are  mutually  exclusive  in  MDS  pa¬ 
tients  (Haferlach  et  al.,  2014;  Papaemmanuil  et  al.,  2013),  but 
the  basis  for  this  observation  was  previously  unknown.  The 
data  here  provide  a  mechanistic  basis  for  this  mutual  exclusiv¬ 
ity  as  SRSF2  mutations  functionally  reduce  EZH2  protein 
expression. 

In  addition  to  the  effects  of  mutant  SRSF2  on  EZH2  splicing 
and  protein  expression,  a  number  of  other  genes  of  known 
importance  in  hematopoiesis  and  malignancy  were  also  consis¬ 
tently  differentially  spliced  in  isogenic  human  cells,  primary  pa¬ 
tient  samples,  and  murine  cells  bearing  mutant  SRSF2.  These 
include  additional  genes  mutated  in  MDS  (such  as  BCOR ),  genes 
with  an  importance  in  hematopoietic  stem  cell  self-renewal  (such 
as  IKAROS),  and  genes  critical  for  cell  survival  (such  as 
CASPASE  8).  Future  efforts  to  understand  the  functional  effects 
of  each  of  these  specific  splicing  events  will  be  important  in 
further  delineating  the  effects  of  mutant  SRSF2  on  MDS  patho¬ 
genesis  as  well  as  possibly  providing  novel  means  for  therapeu¬ 
tic  targeting  of  SRSF2  mutant  cells. 

Our  studies,  which  reveal  both  mechanistic  splicing  alterations 
and  specific  mis-spliced  isoforms  in  SRSF2  mutant  cells,  may 
provide  insights  into  therapeutic  opportunities  for  targeting 
SRSF2  mutant  cells.  For  example,  the  observations  that  mutant 
SRSF2  promotes  the  inclusion  of  a  poison  exon  in  an  ESE- 
dependent  manner  and  that  restoration  of  normally  spliced 
EZFI2  mRNA  partially  rescues  defective  hematopoiesis  in 
SRSF2  mutant  cells  suggest  that  normal  cellular  function  may 
be  at  least  partially  restored  by  manipulating  specific  pathologic 
splicing  events. 

EXPERIMENTAL  PROCEDURES 

Generation  of  the  Srs/2P95H  conditional  knockin  mice  is  described  in  the  Sup¬ 
plemental  Experimental  Procedures.  All  animal  procedures  were  conducted  in 
accordance  with  the  Guidelines  for  the  Care  and  Use  of  Laboratory  Animals 
and  approved  by  the  Institutional  Animal  Care  and  Use  Committees  at  Memo¬ 
rial  Sloan  Kettering  Cancer  Center. 


Patient  Samples 

Studies  were  approved  by  the  Institutional  Review  Boards  of  Memorial  Sloan 
Kettering  Cancer  Center  and  Fred  Hutchinson  Cancer  Research  Center  and 
conducted  in  accordance  to  the  Declaration  of  Helsinki  protocol.  Informed 
consent  was  obtained  from  all  human  subjects. 

mRNA  Sequencing 

For  sorted  mouse  cell  populations,  K562  cells,  and  primary  AML  and  CMML 
samples,  RNA  was  extracted  using  QIAGEN  RNeasy  columns.  poly(A)-selected, 
unstranded  lllumina  libraries  were  prepared  with  a  modified  TruSeq  protocol. 
0.5x  AMPure  XP  beads  were  added  to  the  sample  library  to  select  for  fragments 
of  <400  bp,  followed  by  1  x  beads  to  select  for  fragments  of  >100  bp.  These 
fragments  were  then  amplified  with  PCR  (15  cycles)  and  separated  by  gel  elec¬ 
trophoresis  (2%  agarose).  300-bp  DNA  fragments  were  isolated  and  sequenced 
on  the  lllumina  HiSeq  2000  (100  million  2  x  49  bp  reads/sample). 

RNA-Seq  Read  Mapping 

Reads  were  mapped  to  the  University  of  California,  Santa  Cruz  (UCSC) 
hg19  (NCBI  GRCh37)  human  genome  or  UCSC  mmlO  (NCBI  GRCm38) 
genome  assemblies.  First,  a  modified  version  of  RNA-seq  by  expectation 
maximization  (RSEM)  that  called  Bowtie  vl.0.0,  with  the  -v  2  argument 
was  created.  This  modified  RSEM  was  then  called  with  the  arguments 
‘--bowtie-m  100— bowtie-chunkmbs  500  — calc-ci  —  output-genome-bam’ 
on  the  gene  annotation  file.  Read  alignments  with  MAPping  Quality 
(MAPQ)  scores  of  0  and/or  a  splice  junction  overhang  of  less  than  6  bp 
were  then  filtered  out.  The  remaining  unaligned  reads  were  then 
aligned  by  TopHat  v2.0.8b  with  the  arguments  ‘--bowtiel  -read-mis¬ 
matches  2  — read-edit-dist  2— no-mixed  -no-discordant  — min-anchor- 
length  6  -splice-mismatches  0  —  min-intron-length  10  — max-intron-length 
1000000  — min-isoform-fraction  0.0  — no-novel-juncs  — no-novel-indels 
— raw-juncs’  on  the  splice  junction  file  (— mate-inner-dist  and  — mate-std- 
dev  were  calculated  by  mapping  to  constitutive  coding  exons  with  the 
Mixture  of  Isoforms  (MISO)  exon_utils.py  utility).  The  resulting  TopHat 
alignments  were  then  filtered  as  for  the  RSEM-generated  alignments.  Finally, 
the  RSEM-  and  TopHat-created  binary  sequence  alignment/map  (BAM)  files 
were  merged  to  create  final  BAM  files. 

Isoform  Expression  Measurements 

Two  different  methods  were  used  to  quantify  isoform  ratios.  For  alternative 
splicing  events  from  MISO’s  v2.0  annotation,  MISO  was  used  to  estimate  iso¬ 
form  ratios.  For  alternative  splicing  or  intron  retention  of  annotated  constitutive 
junctions,  junction  reads  alone  were  used  as  described  previously  (Hubert 
et  al.,  201 3).  To  identify  differentially  expressed  events,  we  required  a  minimum 
of  20  identifying  reads  (supporting  either,  but  not  both,  isoforms)  per  event  as 
well  as  a  change  in  isoform  ratio  >  10%.  For  the  LSK,  MP,  and  K562  data,  we 
used  two-sample  statistical  comparisons  (Wagenmakers’  framework;  Bayes 
factor  >  5);].  For  the  AML  and  CMML  data,  we  used  group  statistical  compar¬ 
isons  (Mann-Whitney  L/test,  p  <  0.05).  Real-time  PCR  was  used  to  measure 
EZH2  cassette  exon  inclusion  as  described  in  the  Supplemental  Experimental 
Procedures. 

ACCESSION  NUMBERS 

The  accession  number  for  the  RNA  sequencing  data  reported  in  this  paper  is 
Gene  Expression  Omnibus  (GEO):  GSE65349. 

SUPPLEMENTAL  INFORMATION 

Supplemental  Information  includes  Supplemental  Experimental  Procedures, 
six  figures,  and  five  tables  and  can  be  found  with  this  article  online  at  http:// 
dx.doi.org/1 0.1 01 6/j.ccell.201 5.04.006. 
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Myelodysplastic  syndromes  and  chronic  myelomonocytic  leukemia  (CMML)  are  characterized  by  mutations  in  genes  encoding 
epigenetic  modifiers  and  aberrant  DNA  methylation.  DNA  methyltransferase  inhibitors  (DMTis)  are  used  to  treat  these 
disorders,  but  response  is  highly  variable,  with  few  means  to  predict  which  patients  will  benefit.  Here,  we  examined  baseline 
differences  in  mutations,  DNA  methylation,  and  gene  expression  in  40  CMML  patients  who  were  responsive  or  resistant  to 
decitabine  (DAC)  in  order  to  develop  a  molecular  means  of  predicting  response  at  diagnosis.  While  somatic  mutations  did  not 
differentiate  responders  from  nonresponders,  we  identified  167  differentially  methylated  regions  (DMRs)  of  DNA  at  baseline 
that  distinguished  responders  from  nonresponders  using  next-generation  sequencing.  These  DMRs  were  primarily  localized 
to  nonpromoter  regions  and  overlapped  with  distal  regulatory  enhancers.  Using  the  methylation  profiles,  we  developed 
an  epigenetic  classifier  that  accurately  predicted  DAC  response  at  the  time  of  diagnosis.  Transcriptional  analysis  revealed 
differences  in  gene  expression  at  diagnosis  between  responders  and  nonresponders.  In  responders,  the  upregulated  genes 
included  those  that  are  associated  with  the  cell  cycle,  potentially  contributing  to  effective  DAC  incorporation.  Treatment  with 
CXCL4  and  CXCL7,  which  were  overexpressed  in  nonresponders,  blocked  DAC  effects  in  isolated  normal  CD34+  and  primary 
CMML  cells,  suggesting  that  their  upregulation  contributes  to  primary  DAC  resistance. 


Introduction 

Chronic  myelomonocytic  leukemia  (CMML)  is  a  myelodysplas¬ 
tic  syndrome/myeloproliferative  neoplasm  (MDS/MPN)  overlap 
syndrome  (1)  that  was  historically  classified  within  MDS  (2)  until 
2001  (3).  CMML  shares  many  characteristics  with  MDS,  including 
dysplasia  in  one  or  more  myeloid  cell  lineages  and  increased  risk 
of  transformation  into  acute  myeloid  leukemia  (AML).  However, 
a  distinguishing  feature  of  CMML  is  the  presence  of  persistent 
peripheral  monocytosis  (>1  x  109/1).  CMML  can  be  subdivided 
into  2  subtypes  on  the  basis  of  blast  count:  CMML1,  with  less  than 
10%  bone  marrow  (BM)  blasts,  and  CMML2,  which  has  between 
10%  and  19%  blasts. 

Substantial  epigenetic  abnormalities  have  been  described  in 
both  MDS  and  MDS/MPN.  Mutations  in  epigenome-modifying 
enzymes  are  highly  prevalent  in  these  disorders,  including  those 
responsible  for  DNA  methylation  and  demethylation  —  DNA 
methyltransferase  3A  (DNMT3A)  (4)  and  ten-eleven  transloca¬ 
tion  2  (TET2)  (5,  6),  respectively  —  as  well  as  those  involved  in 
histone-modifying  complexes  —  additional  sex  combs-like  1 
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(ASXL1)  (7)  and  enhancer  of  zeste  homolog  2  (EZH2)  (8-11). 
Although  the  precise  mechanisms  through  which  these  muta¬ 
tions  drive  the  aberrant  epigenetic  changes  observed  in  MDS  are 
still  not  completely  understood,  it  has  been  shown  that  MDS  and 
MDS/MPN  are  characterized  by  a  DNA  hypermethylation  that 
increases  with  disease  severity  (12, 13). 

MDS  and  MDS/MPN  are  resistant  to  conventional  chemo¬ 
therapies;  however,  epigenome-modifying  drugs  can  be  used  suc¬ 
cessfully  as  therapeutics  to  treat  these  disorders.  In  particular,  the 
nucleoside  analogs  azacytidine  (AZA)  and  decitabine  (DAC)  are 
commonly  used  to  treat  MDS  and  CMML  (14, 15).  Both  AZA  and 
DAC  are  DNA  methyltransferase  inhibitors  (DMTis),  and  while 
their  precise  mechanism  of  action  in  treating  MDS  and  MDS/ 
MPN  remains  a  point  of  controversy,  they  are  known  to  be  incor¬ 
porated  into  DNA  during  the  S  phase,  where  they  covalently  trap 
DNA  methyltransferases  and  target  them  for  proteasomal  deg¬ 
radation  (16,  17).  DMTis  can  also  cause  DNA  damage  (18),  and 
because  AZA  is  mostly  incorporated  into  RNA,  it  may  have  addi¬ 
tional  effects  on  RNA  processing  and  translation  (19).  Despite  the 
utility  of  DAC  and  AZA,  only  a  subset  of  MDS  and  CMML  patients 
respond  to  them.  Only  approximately  50%  of  patients  treated  with 
DMTis  show  a  hematological  improvement  (HI)  or  better  that  is 
associated  with  a  survival  benefit  (20).  Furthermore,  as  many  as 
6  months  of  treatment  may  be  required  for  the  therapeutic  bene¬ 
fit  of  DMTis  to  become  apparent,  thus  forcing  half  of  the  patients 
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Table  1.  Clinical  characteristics  of  the  FISM  CMML  patient  cohort  treated  with  DAC 


Clinical  characteristics 

Responders 

Nonresponders 

Rvalue 

Total  no.  of  patients 

20 

20 

CMML1,  no.  (%) 

15  (75%) 

10  (50%) 

NS* 

CMML2,  no.  (%) 

5  (25%) 

10  (50%) 

Male,  no.  (%) 

14  (70%) 

14  (70%) 

NS* 

Female,  no.  (%) 

6  (30%) 

6  (30%) 

Median  age,  yr  (range) 

73.5  (45-84) 

70.5  (41-82) 

NSB 

Median  survival,  mo  (range) 

26.5  (6-39) 

13.5  (2-25) 

P=0.0004c 

Median  hemoglobin,  no.  (range) 

10  (7.2-14.9) 

9.7  (6.6-13.8) 

N5* 

Median  marrow  blasts,  %  (range) 

5  (0-18) 

7  (0-19) 

NS“ 

Median  monocytes,  %  (range) 

24 (2-67) 

22  (5-45) 

NS“ 

Median  wbc,  %  (range) 

17.8  (3.7-75.2) 

18.9  (2.8-52.5) 

NS* 

Cytogenetics 

Normal 

14 

14 

NS* 

Abnormal 

6 

6 

Splenomegaly 

9 

7 

N5* 

Hepatomegaly 

8 

5 

N5* 

Lymphadenomegaly 

2 

3 

N5* 

AFisher’s  exact  test;  Student’s  t  test;  clog-ranl<  test;  DWilcoxon  rank-sum  test. 


to  undergo  long  periods  of  treatment  before  they  can  be  deemed 
resistant  to  this  therapy.  Currently,  there  are  very  few  means  of 
predicting  response  versus  resistance,  and  even  this  is  exclu¬ 
sive  to  AZA  (21).  Additionally,  few  alternative  treatments  exist 
for  patients  who  fail  to  respond  to  DMTis,  and  their  prognosis  is 
extremely  poor.  Therefore,  it  is  critical  that  we  better  understand 
the  molecular  profiles  associated  with  sensitivity  and  resistance  to 
DMTis  in  order  to  improve  risk  stratification  strategies  as  well  as 
shed  light  on  the  mechanisms  of  resistance. 

While  some  studies  have  suggested  that  reversal  of  methyla- 
tion  and/or  transcript  reexpression  of  certain  loci  was  associated 
with  clinical  response  to  DMTis  (22-28),  epigenetic  studies  to  date 
have  failed  to  identify  any  strong  correlation  between  response 
to  these  agents  and  the  presence  of  specific  baseline  DNA  meth- 
ylation  profiles  (23,  26,  27,  29,  30).  We  hypothesized  that  this  lack 
of  correlation  was  due  to  the  promoter-centric  nature  of  assays 
used  over  the  past  decade  and  that  methylation  differences  asso¬ 
ciated  with  potential  for  therapeutic  response  were  likely  present 
in  these  patients  upon  diagnosis  at  promoter-distal  and  intergenic 
regulatory  regions.  In  this  study,  we  report,  for  the  first  time  to  our 
knowledge,  the  identification  of  DNA  methylation  and  expression 
differences  in  diagnostic  BM  specimens  from  a  cohort  of  CMML 
patients  treated  with  DAC.  These  differences,  detected  through 
the  use  of  genome-wide  next-generation  sequencing  assays, 
reveal  underlying  biological  differences  between  these  2  groups  of 
patients  and  point  to  a  novel  mechanism  of  resistance  to  DMTis. 

Results 

Somatic  mutations  do  not  correlate  with  response  to  DAC  in  CMML. 
Somatic  mutations  in  epigenome-modifying  enzymes  and  other 
genes  are  prevalent  in  MDS  and  CMML  (4-6,  31-35).  Recently, 
it  has  been  reported  that  mutations  in  TET2  and  DNMT3A  are 
associated  with  improved  response  to  DMTi  therapy  in  MDS 
and  related  disorders  (36-38).  Despite  this,  the  presence  of  these 


mutations  did  not  translate  to  an  improved  overall 
survival  rate  in  any  of  these  studies,  indicating  that 
therapeutic  response  and  survival  benefit  are  likely 
influenced  by  multiple  different  factors.  More¬ 
over,  these  findings  have  not  been  recapitulated 
in  CMML  exclusively  (39).  To  determine  whether 
particular  genetic  or  epigenetic  abnormalities  are 
associated  with  DMTi  sensitivity  or  resistance  in 
this  disease,  we  studied  a  cohort  of  primary  CMML 
cases.  BM  mononuclear  cells  (BM  MNCs)  were 
collected  from  40  patients  with  de  novo  CMML  at 
the  time  of  their  diagnosis.  All  patients  included 
in  this  study  were  enrolled  in  a  clinical  trial  con¬ 
ducted  by  the  FISM  and  received  single-agent 
treatment  with  DAC  as  frontline  therapy  (20  mg/ 
m2/day  for  5  days),  and  response  was  evaluated 
after  6  cycles  of  treatment.  Responsive  patients 
(n  =  20)  were  defined  as  those  who  achieved  either 
complete  remission,  marrow  complete  remission, 
partial  remission,  or  HI,  as  defined  by  the  2006 
International  Working  Group  (IWG)  response  cri¬ 
teria  for  myelodysplasia  (40).  Patients  with  either 
stable  disease  or  progressive  disease  were  con¬ 
sidered  to  have  primary  resistance  to  DAC  (n  =  20).  As  shown  in 
Table  1,  there  were  no  significant  differences  in  terms  of  age, 
gender,  BM  monocytosis,  blast  percentage,  cytogenetics,  or  pres¬ 
ence  of  either  splenomegaly  or  extramedullary  lesions  between 
responder  and  nonresponder  patients.  Using  MiSeq  to  sequence 
DNA  isolated  from  the  diagnostic  BM  MNCs,  we  performed  tar¬ 
geted  resequencing  of  the  following  panel  of  genes  mutated  at  fre¬ 
quencies  greater  than  5%  in  CMML:  SRSF2 ,  TET2 ,  ASXL1 ,  NRAS , 
DNMT3A ,  RUNX1 ,  U2AF1,  TP53 ,  JAK2,  KIT ,  KRAS,  SF3B1 ,  EZH2, 
IDH1 ,  and  IDH2.  As  with  previous  reports,  SRSF2 ,  TET2 ,  and 
ASXL1  were  the  most  frequently  mutated  genes  in  this  cohort  of 
patients  (6,  32,  34,  35,  41-44).  However,  no  somatic  mutation  was 
significantly  correlated  with  response  to  DAC  in  our  cohort  (Fish¬ 
er’s  exact  test,  P  =  NS  for  all  mutations )  (Figure  1A  and  Table  2). 

We  have  previously  shown,  as  have  others,  that  distinct  DNA 
methylation  profiles  in  AML  and  acute  lymphoid  leukemia  (ALL) 
are  strongly  correlated  with  the  presence  of  specific  molecular 
and  cytogenetic  subtypes  (12,  45-48).  To  determine  whether 
similarly  distinct  methylation  patterns  in  CMML  can  be  linked  to 
the  presence  of  specific  somatic  mutations,  we  examined  DNA 
methylation  patterns  in  the  same  specimens  through  enhanced 
reduced  representation  bisulfite  sequencing  (ERRBS)  (45),  a 
deep-sequencing  method  that  captures  and  accurately  quantifies 
DNA  methylation  at  approximately  3  million  CpG  sites.  ERRBS 
data  were  available  for  39  of  the  40  patients  (19  nonresponders 
and  20  responders).  The  percentage  of  methylation  measured  by 
ERRBS  was  highly  concordant  with  the  findings  of  the  quantitative 
single-locus  DNA  methylation  validation  assay  MassARRAY  Epi- 
TYPER  (ref.  49  and  Supplemental  Figure  1;  supplemental  material 
available  online  with  this  article;  doi:10.1172/JCI78752DSl). 
Unsupervised  clustering  analysis  of  the  patients  based  on  their 
DNA  methylation  patterns  did  not  reveal  a  correlation  between 
gene  mutations  and  particular  methylation  clusters  (Figure  IB). 
In  addition,  there  was  no  significant  difference  in  the  observed 
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patient  survival  time  between  the  2  top-level  methylation  clusters 
(log-rank  test,  P  =  0.33). 

Next,  we  performed  supervised  analyses  comparing  TET2 , 
ASX  LI,  DNMT3A ,  and  SRSF2  WT  and  mutant  cases  to  identify  the 
differentially  methylated  regions  (DMRs)  associated  with  each  of 
these  mutations.  As  expected,  given  its  role  in  de  novo  DNA  meth¬ 
ylation,  we  identified  a  predominantly  hypomethylated  profile 
associated  with  DNMT3A  mutations  (total  DMRs:  243;  hypometh¬ 
ylated  DMRs  [hypo-DMRs]:  197;  hypermethylated  DMRs  [hyper- 
DMRs]:  46)  that  was  targeted  mainly  at  intergenic  and  intronic 
regions  (Figure  2A).  By  contrast,  TET2  loss-of-function  mutations 
were  associated  with  the  presence  of  hypermethylation  compared 
with  that  seen  in  TET2  WT  cases  (total  DMRs:  188;  hypo-DMRs: 
48;  hyper-DMRs:  140)  (Figure  2B).  Mutations  in  ASXL1,  another 


epigenetic  modifier,  were  associated  with  a  specific 
signature  consisting  of  equal  proportions  of  hyper- 
and  hypo-DMRs  (total  DMRs:  144,  hypo-DMRs:  82, 
hyper-DMRs:  62).  Both  hyper-  and  hypo-DMRs  in 
ASXL1- mutant  CMML  cases  were  strongly  depleted 
at  promoter  regions  (hyper-DMRs  3%  vs.  background 
21%,  P  =  6.79  x  10“5;  hypo-DMRs  5%  vs.  background 
21%,  P  =  4.30  x  10“5)  and  significantly  enriched  at 
intergenic  regions  (hypo-DMRs  54%  vs.  background 
38%,  P  =  2.84  x  10“3)  (Figure  2C).  Notably,  muta¬ 
tions  in  the  splicing  factor  SRSF2  were  linked  to  the 
strongest  DNA  methylation  differences,  with  a  total 
of  724  DMRs  (hypo-DMRs:  383;  hyper-DMRs:  341). 
In  this  case,  hypermethylated  DMRs  were  strongly 
enriched  at  promoter  regions  (hyper-DMRs  31% 
vs.  background  21%,  P  =  1.44  x  10“5)  and  depleted 
at  introns  (hyper-DMRs  19%  vs.  background  33%, 
P  =  1.50  x  10-8)  (Figure  2D).  While  SRSF2  itself  does 
not  have  any  direct  epigenetic  function,  it  is  likely 
that  mutations  in  this  gene  lead  to  mis-splicing  and 
the  consequent  deregulation  of  other  epigenome- 
modifying  genes,  resulting  in  this  strong  epigenetic 
signature.  Additionally,  the  observed  survival  time 
was  not  significantly  different  between  the  patients 
with  or  without  individual  DNMT3A ,  TET2,  ASXL1 , 
and  SRSF2  mutations  (log-rank  test,  P  =  0.61,  0.067, 
0.93,  and  0.58,  respectively). 

A  specific  epigenetic  profile  distinguishes  DAC-resis- 
tant  CMML  patients  at  diagnosis.  Previous  efforts  by 
many  groups,  including  ours,  have  failed  to  identify 
baseline  epigenetic  differences  between  DMTi-sen- 
sitive  and  -resistant  patients  (12, 27, 30).  However,  all 
of  these  studies  were  performed  using  platforms  that 
examined  DNA  methylation  within  CpG  islands  and 
gene  promoters.  A  growing  body  of  recent  evidence 
suggests  that  DNA  methylation  and  other  epigenetic 
modifications  at  enhancers  and  other  distal  regulatory  regions 
play  a  key  role  in  transcriptional  regulation  and  that  these  regions 
are  often  located  at  a  significant  distance  from  the  transcription 
start  site  of  the  target  gene  (50).  Therefore,  we  hypothesized  that 
key  epigenetic  differences  may  exist  between  DAC-sensitive  and 
-resistant  patients  at  diagnosis  that  are  located  distally  from  pro¬ 
moters,  targeting  enhancers  and  other  distal  regulatory  regions. 

For  this  purpose,  we  used  the  ERRBS  assay,  a  deep-sequenc¬ 
ing-based  method  that  targets  not  only  promoter  regions  but 
also  intronic,  exonic,  and  distal  intergenic  regions  (45).  Using  the 
MethylSig  package,  we  performed  a  direct  comparison  between 
the  diagnostic  DNA  methylation  profiles  of  DAC-sensitive  and 
DAC-resistant  patients  (51).  We  identified  167  DMRs  that  dis¬ 
played  a  methylation  difference  of  25%  or  more  between  respond- 
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Table  2.  Somatic  mutations  of  the  FISM  cohort  did  not  correlate  with  response 


Mutation 

Nonresponders  (n  =  20) 

Responders  ( n  =  20) 

Total  (/7  =  40) 

PvalueA 

5R5F2 

60.0%  n  =  12 

45.0%  n  =  3 

52.5%  /7  =  21 

0.53 

TET2 

45.0%  n  =  3 

40.0%  n  =  8 

42.5%  /?  =  17 

1.0 

ASXL1 

35.0%  n-1 

45.0%  n  =  9 

40.0%  /7  =  16 

0.75 

NRA5 

20.0%  n  =  4 

20.0% n-  A 

20.0%  n  =  8 

1.0 

DNMT3A 

15.0%  n  =  3 

10.0%  n-1 

12.5%  n  =  5 

1.0 

RUNX1 

10.0%  n-1 

10.0%  n  =  2 

10.0%  n  =  4 

1.0 

U2AF1 

10.0%  n-1 

10.0%  n  =  2 

10.0%  n-A 

1.0 

TP53 

15.0%  n  =  3 

0.0%  n  =  0 

7.5%  n  =  3 

0.23 

\AK2 

5.0%  n  =  1 

5.0%  n  - 1 

5.0%  n-1 

1.0 

KIT 

5.0%  77  =  1 

5.0%  77  =  1 

5.0%  n-1 

1.0 

KRAS 

0.0%  n  =  0 

5.0%  77  =  1 

2.5%  77  =  1 

1.0 

SF3B1 

0.0%  n  =  0 

5.0%  77  =  1 

2.5%  n  =  1 

1.0 

EZH2 

0.0%  n  =  0 

5.0%  n  =  1 

2.5%  n  =  1 

1.0 

IDH1 

0.0%  n  =  0 

5.0%  n  =  1 

2.5%  77  =  1 

1.0 

IDH2 

5.0%  n  =  1 

0.0%  n  =  0 

2.5%  77  =  1 

1.0 

AFisher’s  exact  test. 


ers  and  nonresponders  and  that  were  statistically  significant  at  an 
FDR  of  less  than  0.1.  Among  these  DMRs  were  regions  display¬ 
ing  higher  methylation  in  responders,  as  well  as  regions  of  lower 
methylation  as  compared  with  those  in  nonresponders  (Figure  3A 
and  Supplemental  Table  1).  Hierarchical  clustering  of  our  cohort 
using  these  DMRs  was  sufficient  to  achieve  a  perfect  segrega¬ 
tion  of  DAC-sensitive  and  -resistant  patients  (Figure  3B).  These 
findings  indicate  that  numerous  epigenetic  differences  exist  at 
the  time  of  diagnosis  that  correlate  with  a  patient’s  likelihood  of 
responding  to  DAC  treatment. 

Response-associated  DMRs  localize  preferentially  to  distal  reg¬ 
ulatory  regions.  Next,  we  sought  to  determine  whether  DMRs 
were  distributed  evenly  across  the  genome  or  whether  they  were 
enriched  at  specific  genomic  regions.  For  this,  we  analyzed  both 
the  genomic  distribution  of  DMRs  as  well  as  their  association 
with  known  regulatory  regions.  Notably,  our  analysis  of  the  dis¬ 
tribution  of  DMRs  relative  to  coding  regions  revealed  that  DMRs 
were  significantly  depleted  at  promoter  regions  (DMRs  10%  vs. 
background  21%,  binomial  test  P  =  6.70  x  10“5),  with  a  concurrent 
enrichment  at  intronic  regions,  thus  confirming  our  initial  hypoth¬ 
esis.  This  distribution  was  not  the  same  across  hyper-  and  hypo- 
DMRs.  While  all  DMRs  were  depleted  at  promoter  regions,  hyper- 
DMRs  were  significantly  enriched  at  introns  (hyper-DMRs  49% 
vs.  background  33%,  binomial  test  P  =  1.29  x  10“3),  while  hypo- 
DMRs  were  enriched  at  intergenic  regions  (hypo-DMRs  49%  vs. 
38%  background,  binomial  test  P  =  0.03)  (Figure  4A). 

Next,  we  sought  to  determine  the  association  of  DMRs  with 
regulatory  regions.  For  this  purpose,  we  analyzed  their  relative 
enrichment  at  CpG  island  and  enhancer  regions.  Analysis  of  CpG 
islands  and  CpG  shores  demonstrated  that  DMRs  were  also  signifi¬ 
cantly  depleted  at  CpG  islands  (DMRs  14%  vs.  background  25%, 
binomial  test  P  =  2.8  x  10~4) ,  with  enrichment  at  CpG  shores  (DMRs 
22%  vs.  background  15%,  binomial  test  P  =  8.79  x  10“3).  This  pattern 
was  conserved  across  both  hyper-  and  hypo-DMRs  (Figure  4B). 

Recently,  DNA  methylation  at  enhancers  was  reported  to 
strongly  correlate  with  aberrant  gene  expression  observed  in 


cancer  cells  (52).  We  hypothesized  that  differ¬ 
ential  DNA  methylation  at  enhancers,  rather 
than  at  promoters,  may  be  better  correlated 
with  differential  response  to  DAC  in  CMML. 
Enrichment  analysis  of  all  DMRs  relative  to 
intragenic  and  intergenic  enhancers  revealed 
that  DMRs  were  enriched  for  intragenic 
enhancers  (DMRs  25%  vs.  background  18%, 
binomial  test  P  =  0.01).  When  this  analysis 
was  stratified  into  hyper-  and  hypo-DMRs,  it 
became  apparent  that  hyper-DMRs  showed 
the  strongest  enrichment  at  enhancer  regions 
and,  in  particular,  at  enhancers  located  within 
gene  bodies  (hyper-DMRs  32%  vs.  background 
18%,  binomial  test  P  =  8.14  x  10"4).  Conversely, 
hypo-DMRs  were  not  significantly  enriched 
at  enhancer  regions  and  were  similarly  dis¬ 
tributed  within  gene  body  and  intergenic 
enhancers  (Figure  4C). 

Finally,  we  asked  whether  the  DMRs  asso¬ 
ciated  with  DAC  response  were  specifically 
enriched  within  relevant  biological  pathways.  The  167  DMRs  were 
annotated  to  known  genes,  and  pathway  enrichment  analysis  was 
performed  against  the  KEGG  pathway  database.  The  MAPK  sig¬ 
naling  pathway,  which  plays  a  key  role  in  the  cell  cycle,  apoptosis, 
cell  proliferation,  and  differentiation,  was  significantly  enriched  in 
DMR-associated  genes  (hypergeometric  test  P  =  7.68  x  10  3,  FDR 
=  0.084)  (Supplemental  Figure  2A).  There  were  7  DMRs  that  were 
annotated  to  MAPK  pathway  genes,  including  STMN1 ,  CACNAEf 
PRKCBj  MAPI,  NFATC1,  CRKL ,  and  MKNK2  (Supplemental  Table 

2) .  Three  of  these  DMRS  — those  annotated  to  STMNf  CACNAE1 , 
and  MART— were  hypermethylated  in  DAC  nonresponders,  while 
MKNK2- ,  NFATC1- ,  CRKL- ,  and  PRKCB- associated  DMRs  were 
hypermethylated  in  DAC  responders  (summarized  in  Supplemen¬ 
tal  Table  2).  To  further  validate  epigenetic  deregulation  of  the 
MAPK  signaling  pathway  in  these  patients,  we  performed  Mas- 
sARRAY  EpiTYPER  analysis  of  3  of  the  affected  MAPK  genes  in 
the  pathway  in  a  subset  of  samples  (Supplemental  Figure  2B).  This 
analysis  confirmed  the  increased  methylation  in  the  STMN1  and 
CACNAE1  DMRs  in  nonresponder  patients,  as  well  as  validated 
the  increased  methylation  of  the  NFATC1  DMRin  responders. 

DNA  methylation  differences  can  be  harnessed  for  therapeutic 
response  prediction.  Given  that  our  data  identified,  for  the  first  time 
to  our  knowledge,  the  existence  of  baseline  DNA  methylation  dif¬ 
ferences  between  DAC  responders  and  nonresponders  prior  to 
DAC  treatment,  we  hypothesized  that  these  unique  methylation 
profiles  could  be  harnessed  to  predict  at  the  time  of  diagnosis  which 
patients  would  be  sensitive  or  resistant  to  treatment.  To  test  this,  we 
used  the  percentage  of  cytosine  methylation  at  each  genomic  loca¬ 
tion  among  patients  in  the  FISM  cohort  (cohort  1)  as  potential  pre¬ 
dictors  and  applied  a  machine-learning  approach,  support  vector 
machine  (SVM)  (53),  to  build  a  classifier  (see  details  in  Methods). 
Twenty-one  25 -bp  tile  regions  were  identified  by  feature  selection 
as  the  predictors  with  the  highest  predictability  in  the  SVM  classi¬ 
fier  (Figure  5A,  Supplemental  Figure  3A,  and  Supplemental  Table 

3) .  Unsupervised  analysis  using  only  the  methylation  levels  at  the 
21  selected  tile  regions  revealed  that  they  were  sufficient  to  almost 
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Figure  2.  Distinct  DNA  methylation  profiles  are  associated  with  recurrent  somatic  mutations  in  DNMT3A,  TET2,  ASXLI,  and  SRSF2.  Volcano  plots 
illustrating  the  methylation  differences  between  DA/MT3A-mutant  (n  =  5)  (A),  TET2- mutant  (n  =  17)  (B),  ASXL7-mutant  (n  =  15)  (C),  or  5R5F2- mutant 
{n  =  21)  (D)  samples  versus  WT  patients  {n  =  39  for  the  number  of  mutated  samples).  DMRs  are  indicated  by  red  dots  (beta-binomial  test,  FDR  <0.1  and 
absolute  methylation  different  >25%).  Pie  charts  illustrate  the  relative  proportion  of  CpG  tiles  and  DMRs  annotated  to  the  RefSeq  promoter,  exonic, 
intronic,  and  intergenic  regions. 


separate  the  39  samples  by  response  (Figure  5B  and  Supplemental 
Figure  3,  B  and  C).  There  was  no  defined  clustering  of  the  patients 
according  to  their  specific  degree  of  response  as  shown  by  multi¬ 
dimensional  scaling  (MDS)  analysis  (Supplemental  Figure  3C), 
which  is  concordant  with  the  fact  that  the  classifier  was  built  to 
identify  an  all-or-nothing  response  versus  no  response  and  not  to 
distinguish  between  types  of  responses.  Ten-fold  cross-validation 
was  performed  using  the  cases  from  cohort  1  to  evaluate  the  predic¬ 
tive  performance  of  the  classifier,  and  the  reported  area  under  the 
receiver  operating  characteristic  curve  (ROC-AUC)  was  0.99,  indi¬ 
cating  a  strong  predictive  accuracy  for  the  classifier  model  (Supple¬ 
mental  Figure  3D).  In  order  to  further  assess  the  robustness  of  the 
SVM  classifier  built  with  the  21  selected  features,  we  performed  3 
different  random  splits  of  the  same  cohort  1  into  training  and  test 
sets.  We  trained  the  classifier  on  each  of  the  3  sets  of  randomly 
selected  samples  and  predicted  the  responses  for  the  remaining 
samples  in  the  cohort.  The  classifier  was  able  to  accurately  predict 
response  to  DAC  in  18  of  19  (accuracy  =  94.74%)  (Table  3),  13  of 
14  (accuracy  =  92.86%),  and  9  of  9  (accuracy  =  100%)  patients, 
respectively  (Supplemental  Figure  4A). 

Since  validation  in  an  independent  cohort  of  patients  is  the 
gold  standard  for  biomarker  development,  we  identified  a  second 
cohort  of  patients  in  which  to  test  the  performance  of  our  SVM 
classifier.  Twenty-eight  additional  diagnostic  CMML  specimens 
from  patients  enrolled  in  a  clinical  trial  from  the  Groupe  Franco¬ 
phone  des  Myelodysplasies  (GFM),  all  of  whom  had  been  treated 
with  the  same  DAC  regimen  of  20  mg/m2/day  for  5  days,  were  col¬ 


lected  and  subjected  to  ERRBS  (Table  4  and  Supplemental  Table 
4).  Specimens  from  this  second  cohort  (cohort  2)  of  12  responder 
and  16  nonresponder  patients  consisted  of  sorted  monocytes  from 
peripheral  blood  (PB).  The  SVM  classifier  that  had  been  developed 
using  cohort  1  was  applied  blindly  to  these  samples,  without  any 
prior  knowledge  of  the  therapeutic  response  labels  for  this  second 
cohort.  Due  to  the  stochastic  nature  of  ERRBS,  the  CpG  coverage 
is  never  identical  across  all  samples,  thus  leading  to  missing  values 
for  some  regions  of  interest.  In  effect,  only  6  of  the  21  features  were 
present  in  all  28  samples  in  cohort  2.  Therefore,  using  only  these  6 
features,  we  first  trained  our  SVM  classifier  on  the  39  samples  of  the 
FISM  cohort  (cohort  1)  and  then  applied  the  trained  classifier  on 
the  GFM  cohort  (cohort  2).  As  shown  in  Table  5  and  Supplemental 
Figure  4B,  despite  this  limitation,  the  6-feature  classifier  was  still 
capable  of  correctly  predicting  response  in  20  of  28  patients  in  the 
GFM  cohort  (accuracy  =  71%  and  AUC  =  0.82).  Next,  in  order  to 
increase  the  number  of  features  being  tested  while  still  retaining 
a  large  enough  cohort  in  which  to  test  the  predictive  accuracy,  we 
used  14  of  the  21  features  of  the  SVM  classifier  to  predict  response 
for  19  patients  in  the  GFM  cohort.  Once  again,  we  used  only  these 
14  features  to  train  the  model  on  cohort  1,  which  consisted  of  the 
initial  39  patients,  and  then  blindly  applied  the  model  to  the  19  test 
samples  from  the  GFM  cohort  (cohort  2).  This  modified  classifier 
with  14  features  was  capable  of  accurately  predicting  therapeutic 
outcome  for  15  of  the  19  patients,  which  represents  an  accuracy  of 
79%  and  an  AUC  of  0.83.  (Table  5  and  Supplemental  Figure  4B). 
Finally,  we  determined  that  of  the  original  21  features,  16  was  the 
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Figure  3.  Baseline  DNA  methylation  dif¬ 
ferences  distinguish  DAC  responders  and 
nonresponders  at  the  time  of  diagnosis. 

(A)  Volcano  plot  illustrating  methylation 
differences  between  20  DAC-sensitive  and 
19  DAC-resistant  patients.  Mean  methy¬ 
lation  difference  between  the  2  groups  is 
represented  on  the  x  axis  and  statistical 
significance  (-log10  P  value)  on  they  axis. 
Beta-binomial  test  identified  167  DMRs, 
which  are  indicated  by  red  dots  (FDR 
<0.1  and  absolute  methylation  difference 
>25%).  (B)  Hiearchical  clustering  of  the 
patients  using  the  167  DMRs  illustrates 
the  power  of  these  genomic  regions  in 
segregating  the  patients  into  nonresponder 
(blue)  and  responder  (red)  groups. 
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maximum  number  of  features  shared  by  at  least  15  of  the  cohort-2 
patients.  We  trained  the  model  on  cohort  1  using  only  these  16 
shared  features  and  then  predicted  response  for  the  15  patients 
in  the  independent  cohort  2,  achieving  an  accuracy  of  87%  with 
an  AUC  of  0.94  (Table  5  and  Supplemental  Figure  4B).  These 
findings  demonstrate  that  the  SVM  classifier  developed  using 
the  original  FISM  cohort  is  general  enough  to  be  applied  to  and 
accurately  predict  the  therapeutic  outcome  of  fully  independent 
samples  (i.e.,  GFM  cohort  2),  which  is  a  critical  step  in  the  devel¬ 
opment  of  a  biomarker.  Moreover,  this  robustness  was  main¬ 
tained  even  across  different  cell  types  (BM  MNCs  in  cohort  1  vs. 
PB  monocytes  in  the  validation  cohort  2),  further  underscoring 
the  power  of  the  classifier  to  predict  outcome  in  an  independent 
cohort.  While  further  validation  in  larger  cohorts  will  be  required 
to  fully  assess  the  accuracy  of  the  features  reported  here,  and 
additional  studies  of  larger  cohorts  might  help  refine  the  selec¬ 
tion  of  features  to  include  those  with  the  strongest  accuracy  over  a 
large  number  of  patients,  our  findings  demonstrate  that  the  epige¬ 
netic  differences  between  responders  and  nonresponders  at  diag¬ 
nosis  have  the  potential  to  be  harnessed  as  classifiers  to  predict 
clinical  response  to  DAC. 

DAC  sensitivity  can  be  linked  to  a  specific  transcriptional  pro¬ 
gram  at  diagnosis.  While  it  has  been  previously  shown  that  reduced 
expression  of  uridine- cytidine  kinase,  an  enzyme  involved  in 
nucleoside  metabolism,  is  associated  with  response  to  AZA  in 
MDS  (54),  we  did  not  find  that  differential  expression  of  this  or 
other  DMTi-metabolizing  enzymes  was  associated  with  response 
to  DAC  in  CMML  (data  not  shown).  Therefore,  we  sought  to  deter¬ 
mine  whether  other  transcriptional  differences  between  DAC 
responders  and  nonresponders  are  indicative  of  response  and  can 


provide  insight  on  functional  pathways  that  contribute  to  DAC 
resistance.  We  performed  RNA-sequencing  (RNA-seq)  on  samples 
from  14  patients  (8  responders  and  6  nonresponders)  in  the  cohort 
of  CMML  patients  for  whom  we  had  high-quality  RNA.  Prior  to 
performing  differential  analysis,  we  validated  the  ability  of  our 
RNA-seq  approach  to  accurately  detect  quantitative  variability 
by  performing  quantitative  reverse  transcriptase  PCR  (qRT-PCR) 
on  RNAs  from  13  of  the  14  patients  and  determining  the  degree 
of  agreement  between  the  2  methods  (r  =  0.85,  R 2  value  =  0.73, 
P  <  0.0001)  (Supplemental  Figure  5A).  As  shown  in  Figure  6A,  a 
direct  comparison  of  the  2  groups  of  patients  identified  601  genes 
with  an  absolute  log2  fold  change  greater  than  1  and  a  P  value  of 
less  than  0.05.  Notably,  this  gene  signature  consisted  of  a  majority 
of  genes  overexpressed  in  DAC-sensitive  patients  (405  upregu- 
lated  genes),  with  only  a  small  proportion  of  genes  downregulated 
in  these  patients  (Supplemental  Table  5). 

In  order  to  identify  biological  differences  that  might  explain 
the  difference  between  these  patients  in  their  therapeutic  response 
to  DAC,  we  performed  gene  set  enrichment  analysis  (GSEA)  (55). 
Gene  sets  enriched  in  DAC-sensitive  patients  at  an  FDR  of  less 
than  0.1  were  involved  in  proliferation,  cell  cycle  activity,  and 
DNA  replication  (Figure  6B).  Likewise,  genes  reported  as  being 
downregulated  in  quiescent  versus  dividing  CD34+  cells  (56)  were 
found  to  be  upregulated  in  DAC  responders.  This  enrichment  of 
gene  sets  involved  in  the  cell  cycle  and  in  DNA  replication  in  DAC- 
sensitive  patients  is  consistent  with  the  requirement  for  DAC 
incorporation  into  the  DNA  during  the  S  phase. 

Primary  resistance  to  DAC  is  associated  with  overexpression 
of  ITGfi3  and  the  chemokines  CXCL4  and  CXCL7.  As  mentioned 
above,  only  a  small  fraction  of  genes  were  found  to  have  at  least 
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Figure  4.  DMRs  are  enriched  at  distal  intergenic  regions  and  enhancers.  (A)  Pie  charts  illustrate  the  relative  proportion  of  CpG  tiles  and  DMRs  annotated 
to  RefSeq  promoter,  exonic,  intronic,  and  intergenic  regions.  (B)  Pie  charts  illustrate  the  relative  proportion  of  CpG  tiles  and  DMRs  annotated  to  CpG 
islands,  CpG  shores,  and  regions  beyond  CpG  shores.  (C)  Pie  charts  illustrate  the  relative  proportion  of  CpG  tiles  and  DMRs  annotated  to  enhancers  within 
gene  bodies,  enhancers  within  intergenic  regions,  and  nonenhancer  regions. 


a  2-fold  overexpression  in  DAC -resistant  patients.  Among  these, 
3  genes  that  have  previously  been  implicated  in  chemoresis- 
tance  and  leukemogenesis  were  overexpressed  in  nonrespond¬ 
ers:  CXCL4  (also  known  as  PF4 ),  CXCL7  (also  known  as  PPBP ), 
and  integrin  p3  (. ITGB3 )  (Figure  6C).  Thus,  we  hypothesized  that 
overexpression  of  these  genes  might  be  a  potential  mechanism 
through  which  CMML  acquires  resistance  to  DAC.  First,  as  shown 
in  Figure  7A,  we  validated  the  overexpression  of  these  genes  in 
DAC-resistant  patients  by  qRT-PCR.  Notably,  there  was  a  statis¬ 
tically  significant  linear  correlation  between  the  levels  of  CXCL4 
and  CXCL7  expression  by  both  RNA-seq  (r  =  0.9350,  R 2  =  0.87, 
P  <  0.0001)  and  qRT-PCR  (r  =  0.9865,  R2  =  0.9731,  P  <  0.0001), 
suggesting  that  these  factors  act  in  concert  in  the  BM  microen¬ 
vironment  (Figure  7B).  While  both  chemokines  were  originally 
thought  to  be  produced  exclusively  by  megakaryocytes,  there  is 
evidence  that  monocytes  (57,  58)  and  other  cells  within  the  BM 
also  produce  CXCL4  and  CXCL7  (refs.  59,  60,  and  Supplemental 
Figure  5,  B  and  C).  To  further  confirm  the  overexpression  of  these 
chemokines  in  nonresponder  patients  as  well  as  to  determine  the 
cellular  source  and  localization  of  the  proteins  in  the  BM,  IHC  was 
performed  on  a  subset  of  paraffin- embedded  BM  biopsies  taken  at 
diagnosis  from  responders  and  nonresponders.  As  shown  in  Fig¬ 
ure  7,  C  and  D,  CXCL4  was  primarily  localized  to  megakaryocytes, 
while  CXCL7  staining  was  stronger  in  an  MNC  population  com¬ 
patible  with  a  monocytic  origin.  Importantly,  there  was  increased 
CXCL4  and  CXCL7  staining  in  BM  from  nonresponder  patients  as 


compared  with  that  in  BM  from  responders,  confirming  the  pres¬ 
ence  of  CXCL4  and  CXCL7  proteins  in  the  BM  microenvironment, 
which,  like  mRNA  levels,  are  increased  in  DAC-resistant  patients. 

Previous  studies  have  implicated  serum  levels  of  CXCL4  and 
CXCL7  as  potential  prognostic  markers  in  MDS  (61,  62).  To  deter¬ 
mine  whether  serum  levels  of  CXCL4  and  CXCL7  could  poten¬ 
tially  serve  as  biomarkers  for  DAC  response,  we  quantified  the 
serum  concentrations  of  these  chemokines  by  ELISAs  in  35  of  40 
CMML  patients  (Supplemental  Figure  6).  There  was  no  significant 
difference  in  serum  CXCL4  or  CXCL7  levels  between  responders 
and  nonresponders.  In  addition,  we  found  no  significant  correla¬ 
tion  between  BM  mRNA  levels  and  serum  protein  levels  for  these  2 
chemokines,  indicating  that  serum  levels  of  these  chemokines  are 
not  reflective  of  mRNA  expression  in  the  BM  and  mirroring  previ¬ 
ous  observations  documented  for  other  chemokines  in  the  BM  and 
serum  of  AML  patients  (63,  64). 

CXCL4  and  CXCL7  abrogate  the  effect  of  DAC  on  hematopoietic 
cells.  It  has  been  previously  reported  that  both  CXCL4  and  CXCL7 
can  reduce  the  chemosensitivity  of  BM  cells  to  5-ffuorouracil  in  vitro 
(65),  and  CXCL4  has  been  implicated  in  cell  cycle  arrest  (66)  and 
quiescence  (67,  68),  which  might  be  a  mechanism  through  which 
it  acts  to  prevent  sufficient  incorporation  of  DAC  into  cells  of  non¬ 
responders.  Therefore,  we  hypothesized  that  an  overabundance  of 
CXCL4  and  CXCL7  in  the  BM  microenvironment  acts  to  overcome 
the  effects  of  DAC.  To  test  this,  we  cultured  primary  human  CD34+ 
cells  for  3  days  in  vitro  with  CXCL4  (50  ng/ml),  CXCL7  (50  ng/ml), 
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Figure  5.  Methylation  profiles  can  be  harnessed  to  classify  patients  according  to  DAC  response  at  diagnosis.  (A)  Heatmap  of  21  CpG  tiles  selected  as  the 
5VM  classifier  predictors.  DAC-sensitive  patients  are  indicated  with  the  red  bar  and  nonresponders  with  the  blue  bar.  (B)  Correspondence  analysis  (COA) 
using  only  the  21  CpC  tiles  included  in  the  classifier  could  segregate  the  majority  of  the  CMML  cohort  according  to  DAC  response  (responders  are  repre¬ 
sented  by  red  dots  and  nonresponders  by  blue  dots). 


or  a  combination  of  both  chemokines  in  either  the  presence  or 
absence  of  low- dose  DAC  (10  nM)  and  then  plated  them  in  methyl- 
cellulose  to  test  their  clonogenic  potential.  The  chemokines  and 
low- dose  DAC  did  not  affect  cell  proliferation  during  the  in  vitro 
liquid  culture  period  (Supplemental  Figure  7A).  Moreover,  as  pre¬ 
viously  reported,  low-dose  DAC  did  not  reduce  cell  viability  or 
induce  apoptosis  after  3  days  in  culture  (Supplemental  Figure  7,  B 
and  C,  and  ref.  69).  However,  3  days  of  treatment  with  10  nM  DAC 
significantly  reduced  colony  formation.  Addition  of  either  CXCL4 
or  CXCL7  alone  did  not  have  a  significant  impact  on  DAC-induced 
colony  inhibition.  However,  concomitant  treatment  of  CD34+  cells 
with  CXCL4  and  CXCL7  completely  abolished  the  suppressive 
effect  of  DAC  on  colony  formation  (Figure  8A). 

Finally,  we  tested  the  ability  of  CXCL4  and  CXCL7  to  induce 
resistance  in  primary  CMML  cells.  BM  MNCs  from  diagnostic 
specimens  collected  from  3  patients  were  placed  in  liquid  cul¬ 
ture  and  treated  for  72  hours  with  10  nM  DAC  in  the  presence  or 
absence  of  50  ng/ml  CXCL4,  CXCL7,  or  a  combination  of  both. 
Viability  was  assessed  after  72  hours.  Unlike  normal  CD34+  cells, 
which  did  not  show  diminished  viability  with  10  nM  DAC  (Sup¬ 
plemental  Figure  5B),  treatment  of  primary  CMML  cells  with  low- 
dose  DAC  led  to  a  significant  decrease  in  viability  in  all  3  patients 
(P  <  0.01).  However,  concomitant  treatment  of  CMML  cells  with 
CXCL4,  CXCL7,  or  their  combination  abrogated  the  effect  of  DAC 
on  all  3  patients  (Figure  8B).  Combined,  these  data  support  the 
hypothesis  that  the  presence  of  excess  CXCL4  and  CXCL7  in  the 
marrow  microenvironment  contributes  to  induction  of  DAC  resis¬ 
tance  in  CMML  cells. 

Discussion 

While  DMTis  remain  the  only  FDA-approved  therapy  for  the 
majority  of  MDS  and  nonproliferative  CMML  patients,  prognosis 
following  DMTi  treatment  failure  is  extremely  poor,  with  median 
survival  for  these  patients  barely  reaching  6  months  and  approxi¬ 
mately  50%  of  patients  never  even  achieving  a  response  in  the  first 
place  (20,  70).  This  relatively  low  rate  of  therapeutic  response  is 
further  complicated  by  the  slow  kinetics  of  DMTis,  which  may  take 


as  long  as  6  to  12  months  to  show  efficacy,  thus  committing  the 
majority  of  patients  to  receive  a  drug  to  which  they  will  ultimately 
be  deemed  resistant.  Therefore,  we  set  out  to  study  the  epigenetic 
and  transcriptional  characteristics  associated  with  response  to 
DAC  in  a  cohort  of  CMML  patients  in  order  to  identify  molecu¬ 
lar  features  that  allow  risk  stratification  at  the  time  of  diagnosis 
and,  additionally,  to  explain  the  mechanisms  behind  the  primary 
resistance  to  this  agent.  To  better  understand  the  molecular  and 
mechanistic  basis  for  DMTi  response  and  effectively  risk-stratify 
patients  at  diagnosis,  we  performed  next-generation  sequenc- 


Table  3.  Prediction  performance  of  the  SVM  classifier  trained  on 
20  randomly  selected  samples  and  applied  to  the  remaining  19 
samples  in  the  FISM  cohort  (accuracy  =  94.74%) 


Patient  ID 

Original  label 

Prediction 

1002 

NR 

NR 

0402 

NR 

R 

0501 

NR 

NR 

0502 

R 

R 

0103 

R 

R 

0105 

R 

R 

0205 

NR 

NR 

0202 

R 

R 

1301 

NR 

NR 

1302 

NR 

NR 

1101 

R 

R 

0204 

NR 

NR 

0507 

NR 

NR 

0802 

R 

R 

0404 

NR 

NR 

0108 

R 

R 

1103 

R 

R 

0901 

R 

R 

0701 

R 

R 

NR,  nonresponder;  R,  responder.  Italics  indicate  an  incorrect  prediction. 
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Table  4.  Clinical  characteristics  of  the  GFM  CMML  cohort  treated  with  DAC 


Clinical  characteristics 

Responders 

Nonresponders 

Rvalue 

Total  no.  of  patients 

12 

16 

CMML1,  no.  (%) 

2  (17%) 

10  (62.5%) 

P  =  0.0235' 

CMML2,  no.  (%) 

10  (83%) 

6  (37.5%) 

Male,  no.  (%) 

9  (75%) 

13  (81%) 

NSA 

Female,  no.  (%) 

3  (25%) 

3  (19%) 

Median  age,  yr  (range) 

72.5  (61-88) 

71  (55-85) 

NSB 

Median  survival,  mo  (range) 

39  (8-95) 

14.5  (5-67) 

NSC 

Median  hemoglobin,  %  (range) 

9.1  (6.7-13.3) 

9.05  (8-12.2) 

NSA 

Median  marrow  blasts,  %  (range) 

14  (3-20) 

9  (4-19) 

NSD 

Median  monocytes,  %  (range) 

23 (2-47) 

15.5  (3-34) 

NSD 

Median  wbc,  %  (range) 

18.9  (4.9-77.5) 

24.95  (4.1-81.7) 

NSA 

Cytogenetics 

Normal 

7 

11 

NSA 

Abnormal 

5 

5 

AFisher’s  exact  test;  BStudent’s  t  test;  clog-rank  test;  DWilcoxon  rank-sum  test. 


ing  assays  to  study  both  the  epigenome  and  the  transcriptome 
of  a  uniformly  treated  cohort  of  CMML  patients  who  differed  in 
their  response  to  DAC.  The  use  of  this  improved  technology,  with 
extended  genomic  coverage  and  better  dynamic  range,  allowed 
us  to  detect,  for  the  first  time  to  our  knowledge,  the  presence  of 
DNA  methylation  and  gene  expression  differences  present  at  the 
time  of  diagnosis  that  distinguish  DMTi-sensitive  and  -resistant 
patients.  The  enrichment  of  these  DMRs  at  distal  enhancers,  as 
well  as  the  depletion  of  promoter-associated  DMRs  identified  in 
this  baseline  epigenetic  signature,  underscores  the  importance 
of  analyzing  DNA  methylation  changes  beyond  promoter  regions 
and  explains  the  lack  of  statistically  significant  differential  meth¬ 
ylation  observed  in  previous  studies  that  were  confined  solely  to 
promoter  methylation  analysis  (12,  27,  30). 

Moreover,  our  observation  that  the  genomic  locations  pre¬ 
dominantly  affected  by  differential  DNA  methylation  are  distal 
regulatory  regions  adds  more  data  to  the  strong  evidence  that 
emphasizes  the  critical  role  of  long-range  epigenetic  gene  regula¬ 
tion.  Techniques  to  examine  3D  chromatin  architecture,  such  as 
chromosome  conformation  capture  (3C)  (71)  and  its  subsequent 
iterations  4C  (72,  73),  5C  (74),  and  Hi-C  (75),  have  indicated  that 
gene  regulation  often  occurs  at  very  distant  locations,  in  part 
through  DNA  looping  at  distal  enhancers.  In  fact,  only  a  small 
percentage  (~7%)  of  gene-looping  events  have  been  reported  to 
involve  the  nearest  gene  transcription  start  site  (50).  This  argues 
for  the  critical  role  of  distal,  nonpromoter  regulatory  regions  in 
controlling  gene  expression.  If  the  differential  methylation  at  non- 
promoter  regions  does  impact  the  expression  of  long-range  target 
genes,  this  may  explain  why  several  previous  studies  have  strug¬ 
gled  to  correlate  differential  DNA  methylation  with  gene  expres¬ 
sion  changes  using  nearest-gene  annotations  (30, 76). 

We  found  that  the  MAPK  pathway  was  significantly  enriched 
in  DMRs,  with  both  gains  and  losses  of  methylation  in  responders 
and  nonresponders  within  this  pathway.  These  DMRs  were  local¬ 
ized  to  both  intra-  and  intergenic  genomic  regions  annotated  for  7 
genes  involved  in  the  MAPK  pathway.  While  in-depth  functional 
analysis  of  these  DMRs  will  be  required  in  additional  experiments 


that  are  beyond  the  scope  of  our  study,  our  find¬ 
ings  support  the  results  by  others  suggesting  the 
importance  of  aberrant  MAPK  pathway  signaling 
in  contributing  to  MDS/MPN  (77,  78),  as  well  as  to 
drug  resistance  and  cell  cycle  progression  in  leuke¬ 
mic  cells  (79,  80).  Furthermore,  while  it  is  known 
that  multiple  genes  in  the  MAPK  pathway  can  be 
mutated  in  CMML  (81),  our  results  indicate  that  the 
epigenetic  alterations  of  genes  in  this  pathway  may 
also  be  present  in  CMML  patients. 

While  previous  reports  on  MDS  and  related 
malignancies  have  linked  the  presence  of  cer¬ 
tain  mutations  —  specifically,  TET2  (36-38)  and 
DNTM3A  (37)  —  to  an  increased  rate  of  response  to 
DMTis,  we  could  not  find  any  correlation  between 
the  mutational  status  of  these  and  other  genes  com¬ 
monly  mutated  in  CMML  and  response  to  DAC  in 
our  FISM  cohort.  This  finding  is  in  concordance 
with  those  of  a  previous  report  on  CMML  (39), 
which  likewise  failed  to  detect  a  correlation  between 
response  to  DAC  and  mutational  status,  indicating  that  the  impact 
of  mutational  status  may  be  different  in  CMML  patients  com¬ 
pared  with  that  in  MDS  patients  or  in  mixed  cohorts  consisting 
of  MDS  patients  as  well  as  patients  with  other  myeloid  malignan¬ 
cies,  including  AML  (37,  38)  and  MDS/MPN  (37).  Furthermore, 
the  studies  demonstrating  better  TET2-  and  DNMT3A- associated 
responses  involved  patients  treated  with  AZA  alone  (38)  or  cohorts 
including  both  AZA-  and  DAC -treated  patients  (36,  37),  which 
may  also  contribute  to  the  differing  result  obtained  in  our  study  on 
patients  who  received  DAC  exclusively. 

Conversely,  DNA  methylation  status  was  indeed  different  at 
diagnosis  between  DAC-sensitive  and  D AC-resistant  patients,  and 
we  demonstrate  that  these  differences  can  risk-stratify  patients  at 
the  time  of  diagnosis  using  an  epigenetic  classifier  that  exploits 
these  identified  methylation  differences.  Moreover,  the  SVM  clas¬ 
sifier  developed  in  this  study  performed  with  87%  accuracy  on  an 
independent  cohort,  even  when  only  a  subset  of  the  original  fea¬ 
tures  were  included  and  2  different  cell  types  were  used  in  the  train¬ 
ing  and  validation  cohorts  (BMN  MNCs  vs.  PB  monocytes).  Thus, 
while  the  classifier  reported  here  will  require  further  extensive 
validation  in  larger,  independent  cohorts,  the  present  study  dem¬ 
onstrates  not  only  that  DNA  methylation  differences  exist  between 
patients  with  different  responses  to  DAC  but  that  these  DNA  meth- 


Table  5.  Summary  of  the  prediction  performance  of  the 
independent  validation  cohort  (GFM)  in  3  scenarios  using 
an  increasing  number  of  shared  features  of  the  21  features 
preselected  from  the  FISM  cohort 


Number  of  features 

Correct  predictions/ 

Accuracy  (%) 

used 

Total  patients 

16 

13/15 

87% 

14 

15/19 

79% 

6 

20/28 

71% 
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Figure  6.  A  specific  transcriptional  program  is  associated  with  response  to  DAC.  (A)  Heatmap  illustrates  gene  expression  differences  between  8  DAC- 
sensitive  (indicated  by  the  red  bar)  and  6  DAC-resistant  patients  (indicated  by  the  blue  bar).  Genes  represented  in  the  heatmap  were  identified  by  a  GLM 
likelihood  ratio  test  (P  <  0.05  and  absolute  log2  fold  change  >1).  (B)  Enrichment  plots  for  G5EA  using  the  expression  difference-ranked  gene  list  showing 
enrichment  for  cell  cycle-related  gene  sets.  NES,  normalized  enrichment  score.  (C)  Box  plots  showing  gene  expression  differences  for  CXCL4,  CXCL7,  and 
IT0B3  (red  box  plots  denote  responders;  blue  box  plots  denote  nonresponders).  P  values  were  obtained  from  a  GLM  likelihood  ratio  test. 


ylation  differences  are  sufficiently  robust  to  be  harnessed  for  use  in 
the  clinic  as  accurate  classifiers.  These  classifiers  have  the  potential 
to  prevent  patients  who  are  unlikely  to  respond  to  DAC  from  receiv¬ 
ing  prolonged,  unwarranted  treatments  with  this  drug  and  instead 
permit  them  to  be  quickly  transitioned  to  alternative  therapies. 


In  addition  to  epigenetic  differences,  our  study  also  revealed 
baseline  differences  at  the  transcriptional  level  that  correlated  with 
response  to  DAC.  Analysis  of  this  response-associated  signature 
demonstrated  a  strong  enrichment  for  gene  sets  involved  in  cell 
cycle  regulation  among  the  genes  upregulated  in  DMTi- sensitive 
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Figure  7.  CXCL4  and  CXCL7  are  upregulated  in  the  BM  of  nonresponders.  (A)  qRT-PCR  showing  validation  of  overexpression  of  CXCL4,  CXCL7,  and  IT0B3 
in  nonresponders;  each  point  represents  the  mean  of  triplicate  wells  for  each  patient  sample;  the  line  and  error  bars  indicate  the  group  mean  and  SD, 
respectively.  (B)  Pearson’s  correlation  analysis  of  expression  levels  of  CXCL7  and  CXCL4  by  RNA-seq  and  qRT-PCR.  (C  and  D)  Representative  IHC  images  for 
CXCL4  (C)  and  CXCL7  (D)  in  diagnostic  BM  biopsies  in  DAC  responders  and  nonresponders.  Original  magnification,  x40  (C  and  D,  left  panels),  x63  (C  and  D, 
right  panels).  Representative  images  from  duplicate  experiments  are  shown. 


patients.  This  finding  is  in  line  with  the  need  for  DAC  to  be  incor¬ 
porated  into  the  DNA  during  cell  cycle  activity  in  order  to  exert 
its  effects.  By  contrast,  fewer  genes  were  upregulated  in  resistant 
patients.  Among  these  overexpressed  genes,  we  found  CXCL4  and 
CXCL7,  two  chemokines  that  have  been  previously  implicated  in 
mediating  cell  cycle  arrest  (66),  quiescence  (67,  68),  and  reduced 
chemosensitivity  of  BM  cells  to  5-fluorouracil  in  vitro  (65) .  We  there¬ 
fore  focused  our  efforts  on  studying  the  impact  of  these  chemokines 
on  response  to  DAC.  In  vitro  treatment  of  both  normal  CD34+  cells 
or  primary  CMML  MNCs  with  CXCL4  and  CXCL7  blocked  the 
effect  of  DAC  on  these  cells,  indicating  that  overexpression  of  these 
2  genes  may  indeed  lead  to  primary  resistance  to  DAC  and  opening 
the  possibility  for  future  targeting  of  the  downstream  signaling  cas¬ 
cades  in  order  to  overcome  the  effect  of  these  chemokines. 

Methods 

Sample  collection  and  processing 

FISM  cohort.  BM  specimens  were  collected  before  treatment  from  40 
patients  with  CMML.  BM  MNCs  were  isolated  through  Ficoll  density 


centrifugation  and  viably  frozen  in  10%  DMSO  and  90%  FBS.  Patients 
with  advanced  CMML  were  enrolled  in  the  nonrandomized  clinical 
trial  conducted  by  the  FISM  (NCT01251627;  https://clinicaltrials.gov/) 
and  were  given  DAC  (20  mg/m2/day  i.v.)  for  5  days  every  28  days  for  at 
least  6  cycles  prior  to  being  classified  as  responders  or  nonresponders, 
with  response  defined  as  HI  or  better  according  to  IWG  2006  criteria 
(40).  The  clinical  characteristics  of  the  patients  are  summarized  in 
Table  1.  Genomic  DNA  and  total  RNA  were  isolated  using  the  AllPrep 
DNA/RNA  kit  (QIAGEN)  according  to  the  manufacturer’s  instructions. 

GFM  cohort.  The  patients  were  enrolled  in  the  EudraCT  2008- 
000470-21  GFM  trial  (NCT01098084;  https://www.clinicaltrials. 
gov/)  and  received  DAC  (20  mg/m2/day  i.v.)  for  5  days  every  28  days 
for  at  least  3  cycles.  Blood  samples  were  collected  using  EDTA-con- 
taining  tubes,  mononucleated  cells  were  isolated  on  Ficoll-Hypaque, 
and  monocytes  were  enriched  using  the  AutoMacs  system  (Miltenyi 
Biotec)  through  negative  selection  with  microbeads  conjugated  with 
antibodies  against  CD3,  CD7,  CD16,  CD19,  CD56,  CD123,  and  glyco- 
phorin  A,  then  further  enriched  by  positive  selection  with  microbeads 
conjugated  with  a  monoclonal  mouse  anti-human  CD14  antibody 
(Miltenyi  Biotec).  Genomic  DNA  was  extracted  from  the  monocytes 
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Figure  8.  CXCL4  and  CXCL7  promote  resistance  to  DAC  in  CD34+  and  primary  CMML  specimens.  (A)  Colony  formation  was  inhibited  by  DAC  but  restored 
with  the  combination  of  CXCL4  and  CXCL7.  CD34+  cells  were  treated  with  1  dose  of  CXCL4,  CXCL7,  or  both  (50  ng/ml  each)  or  with  vehicle  (PBS  containing 
0.1%  BSA)  and  daily  10-nM  doses  of  DAC  for  3  days.  After  3  days  of  in  vitro  treatment  with  DAC,  cells  were  plated  in  methylcellulose  and  incubated  for 
12  to  15  days  before  colonies  were  counted.  Data  represent  the  mean  ±  SD.  Treatment  with  10  nM  DAC  significantly  decreased  colony  formation  but  failed 
to  do  so  in  the  presence  of  CXCL7  and  CXCL4  together.  Shown  in  the  3  panels  are  the  results  of  3  independent  experiments.  Error  bars  represent  the  SD. 

(B)  CXCL4  and  CXCL7  abrogated  the  effect  of  DAC  on  the  viability  of  primary  CMML  MNCs.  CMML  MNCs  were  treated  in  vitro  for  72  hours  with  10  nM  DAC 
alone  or  in  the  presence  of  50  ng/ml  CXCL4,  CXCL7,  or  both.  Data  represent  the  mean  ±  SD.  Treatment  with  DAC  alone  significantly  reduced  the  viability  of 
these  cells,  but  this  effect  was  lost  when  CXCL4  or  CXCL7  was  added  to  the  culture.  All  data  represent  independent  experiments  performed  in  3  different 
CMML  patients.  Error  bars  represent  the  SD.  *P  <  0.05  and  **P  <  0.01  by  unpaired  2-tailed  Student’s  t  test. 


using  the  Norgen  Biotek  kit  (Thorold)  kit  according  to  the  manufac¬ 
turer’s  instructions.  The  clinical  characteristics  of  the  patients  are 
summarized  in  Table  4. 

Mutational  sequencing 

Target  capture.  Capture  of  the  target  regions  (exons  plus  splice  junc¬ 
tions)  was  carried  out  using  a  custom-designed  HaloPlex  Target 
Enrichment  kit  (Agilent  Technologies)  following  the  HaloPlex  Target 
Enrichment  System-Fast  Protocol,  version  D.5. 

Sequencing.  DNA  (500  ng)  from  each  sample  was  quantified 
with  a  Qubit  Fluorometer  (Invitrogen)  and  used  in  the  capture  reac¬ 
tion.  Each  sample  had  a  unique  index.  Libraries  were  quantified  by 
Qubit,  pooled,  and  run  in  an  Illumina  HiSeq  2500  rapid-run  flow 
cell  using  the  on-board  cluster  method  for  paired-end  sequencing 
(2  x  100  bp  reads). 

Analysis.  Sequencing  results  were  demultiplexed  and  converted  to 
a  FASTQ/ormat  using  Illumina  BCL2FASTQsoftware.  The  reads  were 
adapter  and  quality  trimmed  with  Trimmomatic  (82)  and  then  aligned 
to  the  human  genome  (UCSC  build  hgl9)  using  the  Burrows-Wheeler 
Aligner  (83).  Further  local  indel  realignment  and  base-quality  score 
recalibration  were  performed  using  the  Genome  Analysis  Toolkit 
(GATK)  (84).  Single-nucleotide  variation  and  indel  calls  were  gen¬ 
erated  with  the  GATK  HaplotypeCaller.  ANNOVAR  (85)  was  used 
to  annotate  variants  with  functional  consequence  on  genes  as  well 
as  to  identify  the  presence  of  these  variants  in  dbSNP  137,  the  1000 
Genomes  Project,  ESP6500  (National  Heart,  Lung,  and  Blood  Insti¬ 
tute  [NHLBI]  GO  Exome  Sequencing  Project),  and  COSMIC  67. 


Genome-wide  DNA  methylation  by  ERRBS 

High-molecular-weight  genomic  DNA  (25  ng)  was  used  to  perform  the 
ERRBS  assay  as  previously  described  (45)  and  was  sequenced  on  an 
Illumina  HiSeq  2000.  Reads  were  aligned  against  a  bisulfite-converted 
human  genome  (hgl8)  using  Bowtie  and  Bismark  (86).  Downstream 
analysis  was  performed  using  R statistical  software  (version  3.0.3)  (87), 
Bioconductor  2.13  (88),  and  the  MethylSig  0.1.3  package  (51).  Only 
genomic  regions  with  coverage  ranging  from  10  to  500  times  were 
used  for  the  downstream  analysis.  DMRs  were  identified  by  first  sum¬ 
marizing  the  methylation  status  of  the  genomic  regions  into  25 -bp  tiles 
and  then  identifying  regions  with  an  absolute  methylation  difference  of 
25%  or  more  and  an  FDR  of  less  than  10%.  DMRs  were  annotated  to 
the  ReflSeq  genes  (NCBI)  using  the  following  criteria:  (a)  DMRs  over¬ 
lapping  with  a  gene  were  annotated  to  that  gene;  (b)  intergenic  DMRs 
were  annotated  to  all  neighboring  genes  within  a  50 -kb  window;  and 
(c)  if  no  gene  was  detected  within  a  50-kb  window,  then  the  DMR  was 
annotated  to  the  nearest  transcription  start  site  (TSS). 

Methylation  classifier 

SVM  (53)  was  applied  using  R  package  el071  (89)  to  classify  the  2 
groups  of  patients  (responders  and  nonresponders),  in  which  the 
percentage  of  methylation  of  the  25-bp  tiles  was  used  as  a  predic¬ 
tor.  The  probability  mode  and  sigmoid  kernel  were  used  in  the  SVM 
function,  otherwise  the  default  parameters  were  applied.  We  per¬ 
formed  2-step  feature  selections  for  the  SVM  classifier:  (a)  25 -bp 
tiles  were  prefiltered  by  nominal  P  values  of  less  than  0.05  and  by  an 
absolute  methylation  difference  greater  than  20%,  calculated  using 
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the  MethylSig  package  (51);  (b)  greedy  forward-feature  selection  was 
applied  on  the  remaining  tiles.  Briefly,  we  assessed  and  prioritized 
the  predictability  of  each  of  the  filtered  tiles  in  the  SVM  model  and 
then  sequentially  evaluated  the  combinatorial  predictability  of  the 
tiles  by  adding  1  tile  from  the  prioritized  tiles  to  the  classifier  at  a 
time.  The  final  predictors  of  the  SVM  classifier  were  selected  from 
the  set  of  tiles  that  could  optimally  predict  patient  response.  The  pre¬ 
dictability  was  assessed  on  the  basis  of  10-fold  cross-validation.  Spe¬ 
cifically,  we  randomly  partitioned  the  39  samples  for  which  ERRBS 
libraries  were  available  into  10  complementary  subsets,  training  the 
SVM  model  on  9  of  the  10  subsets  (called  the  training  set)  and  pre¬ 
dicting  the  classes  (responder  or  nonresponder)  on  the  1  left-out  sub¬ 
set  (called  the  validation  set  or  testing  set).  To  reduce  variability,  10 
rounds  of  cross-validation  were  performed  using  different  partitions, 
and  the  validation  results  were  summarized  over  the  rounds.  During 
each  round  of  validation,  the  probability  of  each  sample  being  pre¬ 
dicted  as  a  responder  was  recorded,  and  then  the  ROC-AUC  across 
10  rounds  was  calculated  with  the  R  package  ROCR  (90),  and  this 
calculation  was  used  as  the  assessment  of  the  predictability.  Com¬ 
plete  code  is  provided  in  the  Supplemental  Methods. 

EpiTYPER  MassARRAY 

Validation  of  CpG  methylation  of  select  genomic  regions  was  per¬ 
formed  by  MALDI-TOF  using  EpiTYPER  MassARRAY  (Sequenom) 
(49)  on  bisulfite-converted  genomic  DNA  from  a  subset  of  DAC 
responders  and  nonresponders.  The  primers  used  to  amplify  these 
genomic  regions  and  the  resultant  amplicon  sequences  are  listed  in 
Supplemental  Table  6. 

RNA-seq 

RNA-seq  was  performed  on  RNA  samples  from  14  patients  (8  respond¬ 
ers  and  6  nonresponders)  who  had  high-quality  RNA  (RNA  integrity 
number  >6  as  determined  by  the  Agilent  2100  Bioanalyzer).  RNA- 
seq  libraries  were  prepared  using  the  Illumina  TruSeq  RNA  Sample 
Prep  Kit  (version  2)  according  to  the  manufacturer’s  instructions.  A 
set  of  synthetic  RNAs  from  the  ERCC  (91)  at  known  concentrations 
were  mixed  with  each  of  the  cDNA  libraries.  Four  separate  samples 
were  multiplexed  into  each  lane  and  sequenced  on  a  HiSeq  2000. 
The  quality  of  reads  obtained  was  evaluated  using  FastQC  (Babraham 
Bioinformatics;  http://www.bioinformatics.babraham.ac.uk/projects/ 
fastqc/).  The  sequenced  libraries  were  aligned  to  the  human  genome 
(hgl8)  or  to  the  ERCC  spike-in  reference  sequence  using  TopHat,  ver¬ 
sion  2.0.8  (92),  with  default  parameters. 

RNA-seq  analysis 

HTSeq  (0.5.4p5)  (93)  was  used  to  generate  the  count  matrix  with  the 
following  parameters:  “htseq-count  -mode=union  -stranded=no” 
using  the  following  2  gene  transfer  format  (GTF)  annotation  files, 
respectively:  (a)  the  hgl8  RefSeq  gene  GTF  file  downloaded  from  the 
UCSC  genome  browser  for  endogenous  gene  assembly;  (b)  the  ERCC 
spike-in  transcript  GTF  file  downloaded  from  the  official  website 
(http://www.lifetechnologies.com/order/catalog/product/4456740) 
for  ERCC  spike-in  assembly.  The  endogenous  gene  counts  were  nor¬ 
malized  by  ERCC  spike-in  library  size,  and  the  differential  expression 
analysis  was  performed  using  the  edgeR  (version  3.4.2;  Bioconductor) 
(94)  generalized  linear  model  (GLM).  Genes  with  an  absolute  log2  fold 
change  greater  than  1  and  a  Rvalue  of  less  than  0.05  were  reported. 


qRT-PCR 

To  validate  the  RNA-seq  results,  RNA  from  selected  nonresponder  and 
responder  patients  was  reverse  transcribed  using  the  Verso  cDNA  syn¬ 
thesis  kit  (Thermo  Scientific)  with  random  hexamer  primers,  accord¬ 
ing  to  the  manufacturer’s  instructions.  qRT-PCR  was  performed  on 
the  resulting  cDNA  in  triplicate  using  intron-spanning  and  -flanking 
primer  sets  with  Fast  SYBR  Green  Master  Mix  and  the  StepOne  Plus 
PCR  System  (Applied  Biosystems)  according  to  the  manufacturer’s 
instructions.  The  primer  sequences  are  fisted  in  Supplemental  Table  7. 

ELISAs 

ELISAs  for  CXCL4  and  CXCL7/NAP2  on  serum  from  the  CMML 
patients  were  performed  using  the  corresponding  ELISA  kits 
(RAB0402  and  RAB0135)  from  Sigma-Aldrich  according  to  the  man¬ 
ufacturer’s  directions.  For  CXCL4,  the  serum  was  diluted  1:500  in  the 
sample  dilution  buffer  provided  in  the  kit. 

IHC 

For  immunostaining,  3-pm-thick  formalin-fixed,  paraffin-embedded 
BM  sections  were  deparaffinized  in  xylenes  and  hydrated  in  graded 
alcohols.  Antigen  retrieval  was  performed  in  EDTA  (1  mM,  pH  8.0)  for 
two  15 -minute  cycles  at  maximum  power  in  a  microwave  oven,  and 
slides  were  then  incubated  with  a  CXCL4  antibody  (1:300,  catalog 
500-P05;  PeptroTech)  or  a  CXCL7  antibody  (1:50,  catalog  orbl3423; 
Biorbyt).  Immunostaining  was  performed  with  the  BenchMark  histo- 
stainer  (Ventana  Medical  Systems,  Roche)  using  a  peroxidase  detec¬ 
tion  kit  with  DAB  substrate  according  to  standard  procedures.  Sec¬ 
tions  were  then  counterstained  with  hematoxylin. 

Cell  culture  and  colony-forming  assays 

CD34+  cells  were  isolated  from  cryopreserved  BM  MNCs  from  femoral 
head  specimens  using  the  CD34  MicroBead  Isolation  Kit  (Miltenyi  Bio¬ 
tec)  according  to  the  manufacturer’s  instructions.  For  CMML  cells,  the 
cryopreserved  BM  MNCs  were  rapidly  thawed  at  37°C  and  treated  with 
DNAse  to  prevent  cell  clumping.  Cells  were  plated  in  prestimulation 
media  containing  IMDM  with  20%  BIT  (STEMCELL  Technologies);  IL-6 
(20  ng/ml);  SCF  (100  ng/ml);  TPO  (100  ng/ml);  and  FLT3L  (10  ng/ml) 
(PeproTech)  and  recovered  overnight.  The  following  day,  either  CXCL4 
(50  ng/ml;  PeproTech);  CXCL7  (50  ng/ml;  PeproTech);  a  combination  of 
both  chemokines  (50  ng/ml  each);  or  vehicle  (PBS  containing  0.1%  BSA) 
was  added  as  well  as  freshly  prepared  DAC  (10  nM)  (Sigma-Aldrich)  or 
vehicle  (water).  DAC  was  replenished  daily  for  a  total  of  3  days.  Live  cell 
numbers  and  viability  were  determined  by  trypan  blue  exclusion. 

For  colony  assays,  an  equal  number  of  live,  treated  CD34+  cells 
were  plated  in  duplicate  in  H4435  Enriched  MethoCult  (STEMCELL 
Technologies).  Colonies  were  counted  after  12  to  15  days. 

Apoptosis  assays 

Apoptosis  was  assessed  using  the  Tali  Apoptosis  Kit  with  annexin  V 
Alexa  Fluor  288  and  propidium  iodide  according  to  the  manufactur¬ 
er’s  instructions  and  was  measured  on  a  Tali  Image-Based  Cytometer 
(all  from  Life  Technologies). 

Accession  numbers 

FISM  cohort  ERRBS  and  RNA-seq  data  are  deposited  in  the  NCBl’s  Gene 
Expression  Omnibus  (GEO)  database  (GEO  GSE61163).  GFM  cohort 
ERRBS  data  are  also  deposited  in  the  GEO  database  (GEO  GSE63787). 
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Statistics 

For  the  analysis  of  clinical  parameters,  Fisher’s  exact  test  was  used  for 
CMML  type  and  sex;  unpaired,  2-tailed  Student’s  t  tests  were  used  for 
clinical  parameters  with  a  normal  distribution;  Wilcoxon  signed-rank 
tests  were  used  when  the  samples  were  not  normally  distributed;  and 
the  log-rank  test  was  used  for  survival.  A  P  value  of  less  than  0.05  was 
considered  significant.  Somatic  mutations  between  nonresponders 
and  responders  was  evaluated  using  Fisher’s  exact  test,  and  signifi¬ 
cance  was  considered  at  a  P  value  of  less  than  0.05.  For  in  vitro  cell 
culture  and  colony-forming  experiments,  unpaired,  2-tailed  Student’s 
t  tests  were  used  for  comparisons,  and  significance  was  considered  at 
a  P  value  of  less  than  0.05.  For  correlation  analysis  between  the  RNA- 
seq  and  qPCR  results,  Pearson’s  correlation  was  performed,  and  the  r 
values  and  Rvalues  are  indicated  in  the  figures.  The  ERRBS  and  RNA- 
seq  analyses  were  performed  using  a  beta  binomial  test  for  differen¬ 
tial  methylation  and  a  generalized  linear  model  likelihood  ratio  test 
for  differential  gene  expression.  These  methods  were  implemented 
through  specific  algorithms  that  are  described  in  detail  in  their  respec¬ 
tive  sections  above. 
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